Building an AI-Powered SQL Assistant: A Beginner's Guide to the Code

In this blog, we'll break down the provided Python code that creates a streamlit-based AI-powered SQL assistant. This assistant enables users to interact with a database, write optimized SQL queries, and retrieve results—all powered by OpenAI's GPT-4-turbo model. By the end of this blog, you'll understand how the code works and how you can use or modify it for your own projects.

Introduction

The code builds a user-friendly web app that:

Connects to a PostgreSQL database.
Uses OpenAI's GPT-4 model to generate SQL queries.
Executes the queries on the database.
Displays the results, SQL query, and explanations.

The app is beginner-friendly and uses Streamlit for the interface, SQLAlchemy for database interaction, and LangChain for managing the AI functionality.

Breaking Down the Code

1. Setup and Dependencies

Before diving into the code, the following libraries are used:

Streamlit: For creating the web-based user interface.
SQLAlchemy: For managing database connections.
LangChain: For managing interactions with the OpenAI language model.
dotenv: For loading environment variables securely.
Pandas: For handling query results as dataframes.
sqlparse and sqlglot: For validating and formatting SQL queries.

Install these dependencies using:

pip install streamlit sqlalchemy langchain pandas python-dotenv sqlparse sqlglot

2. Class Overview: `SQLAgentApp`

The code is organized into a class called SQLAgentApp. This class encapsulates the entire application logic, making the code modular and reusable.

3. Environment Setup

The _init_environment method ensures that the OpenAI API key is loaded from environment variables using the dotenv library.

def _init_environment(self):
    self.openai_key = os.getenv("OPENAI_API_KEY")
    if not self.openai_key:
        st.error("OPENAI_API_KEY not found in environment variables")
        st.stop()

If the API key is missing, the app stops and shows an error message.
This ensures secure handling of sensitive API keys.

4. Database Initialization

The _init_database method connects to a PostgreSQL database using SQLAlchemy. It retrieves database credentials from environment variables or uses default values.

def _init_database(self):
    db_config = {
        "user": os.getenv("PG_USER", "postgres"),
        "password": os.getenv("PG_PASSWORD", "postgres_password"),
        "host": os.getenv("PG_HOST", "localhost"),
        "port": os.getenv("PG_PORT", "5432"),
        "database": os.getenv("PG_DATABASE", "postgres_db")
    }

    self.DATABASE_URL = (
        f"postgresql+psycopg2://{db_config['user']}:{db_config['password']}"
        f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
    )

The code uses create_engine to establish a connection to the database.
If the connection fails, an error message is displayed using Streamlit, and the app stops.

5. Initializing the LLM

The _init_llm method creates an instance of OpenAI's GPT-4-turbo model using LangChain.

def _init_llm(self):
    self.llm = ChatOpenAI(
        model_name="gpt-4-turbo-preview",
        temperature=0,
        streaming=True,
        api_key=self.openai_key
    )

temperature=0 ensures deterministic responses.
Streaming is enabled for real-time responses in Streamlit.

6. Prompt Engineering

The _get_prompt_prefix method constructs a detailed prompt for the AI model. It includes:

The database schema (tables and columns).
Clear rules for generating SQL queries.
Formatting guidelines for the response.

def _get_prompt_prefix(self) -> str:
    schema = self._get_schema_info()
    current_year = datetime.now().year
    return f"""
    You are a world-class SQL expert with 15+ years of experience. ...
    Database Schema:
    {schema}
    """

The schema is dynamically retrieved using SQLAlchemy's inspect module.
The prompt ensures the AI generates PostgreSQL-compliant SQL.

7. Agent Initialization

The _init_agent method creates a LangChain SQL agent.

self.agent = create_sql_agent(
    llm=self.llm,
    toolkit=self.toolkit,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    memory=self.memory,
    prefix=self._get_prompt_prefix(),
    format_instructions=self._get_format_instructions(),
    handle_parsing_errors=True,
    max_iterations=5,
    early_stopping_method="generate"
)

LangChain SQL Agent: Combines the AI model with SQL tools to generate and execute queries.
ConversationBufferMemory: Stores the chat history for better context.
Format Instructions: Ensures the AI provides clear, structured responses.

8. User Interface

The run_ui method builds the entire web interface using Streamlit.

Sidebar

The sidebar displays:

Database Schema: A list of tables and columns retrieved dynamically.
Chat History: Previous user and AI messages.

Main Chat Interface

The main interface handles:

User Input: Users can ask questions about the database.
AI Responses: The AI responds with SQL queries, results, and explanations.

if prompt := st.chat_input("Ask a data question..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    
    # AI generates response
    response = self.agent.run(prompt)

User input is sent to the AI agent.
The response includes the SQL query, results, and analysis.

9. SQL Execution and Processing

The _process_response method extracts and executes the SQL query generated by the AI.

if "```sql" in response:
    sql_part = response.split("```sql")[1].split("```")[0].strip()
    result["sql"] = sqlparse.format(sql_part, reindent=True)
    result["data"] = pd.read_sql_query(result["sql"], self.engine)

SQL Parsing: Extracts the SQL code from the response.
Execution: Runs the SQL query on the database using Pandas.
Result Formatting: Processes the query result into a readable format.

10. Error Handling

The code includes robust error handling:

If the database connection fails, the app stops with an error message.
SQL execution errors are caught and displayed to the user.

11. Running the App

Finally, the app is launched using:

if __name__ == "__main__":
    app = SQLAgentApp()
    app.run_ui()

How the App Works

Step-by-Step Workflow

The user inputs a question, such as "Show all active products."
The AI generates a PostgreSQL query based on the database schema.
The query is executed, and the results are displayed in a table.
The AI provides an explanation of the query and optimization tips.

Key Features

Dynamic Schema Retrieval: Automatically fetches database tables and columns.
AI-Generated SQL: Ensures accurate and optimized queries.
Real-Time Execution: Runs queries instantly and displays results.
Detailed Analysis: Explains the results and suggests optimizations.

Conclusion

This code demonstrates how to build an AI-powered SQL assistant that integrates database querying, AI, and a user-friendly interface. By following this guide, you can adapt the application to your own database and enhance it with additional features.

import os
import re
import sqlparse
from dotenv import load_dotenv
from typing import List, Dict, Any
from sqlalchemy import create_engine, inspect
from sqlglot import transpile  # ✅ FIXED IMPORT
import pandas as pd
import streamlit as st
from datetime import datetime

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.agents import create_sql_agent, AgentType
from langchain.callbacks import StreamlitCallbackHandler
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
import sqlparse

# Load environment variables
load_dotenv()
os.environ['OPENAI_API_KEY'] = ""  # Replace with your OpenAI key
class SQLAgentApp:
    def __init__(self):
        self._init_environment()
        self._init_database()
        self._init_llm()
        self._init_agent()

    def _init_environment(self):
        self.openai_key = os.getenv("OPENAI_API_KEY")
        if not self.openai_key:
            st.error("OPENAI_API_KEY not found in environment variables")
            st.stop()

    def _init_database(self):
        # Initialize database connection
        db_config = {
            "user": os.getenv("PG_USER", "postgres"),
            "password": os.getenv("PG_PASSWORD", "postgres_password"),
            "host": os.getenv("PG_HOST", "localhost"),
            "port": os.getenv("PG_PORT", "5432"),
            "database": os.getenv("PG_DATABASE", "postgres_db")
        }

        self.DATABASE_URL = (
            f"postgresql+psycopg2://{db_config['user']}:{db_config['password']}"
            f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
        )

        try:
            self.engine = create_engine(self.DATABASE_URL)
            self.db = SQLDatabase(self.engine)
            self.inspector = inspect(self.engine)
        except Exception as e:
            st.error(f"Database connection failed: {str(e)}")
            st.stop()

    def _init_llm(self):
        self.llm = ChatOpenAI(
            model_name="gpt-4-turbo-preview",
            temperature=0,
            streaming=True,
            api_key=self.openai_key
        )

    def _get_prompt_prefix(self) -> str:
        schema = self._get_schema_info()
        current_year = datetime.now().year
        return f"""
        You are a world-class SQL expert with 15+ years of experience. Follow these rules:
        1. Generate PostgreSQL-compliant SQL
        2. Validate queries with SQLGlot before execution
        3. Use CTEs for complex logic
        4. Optimize for query performance
        5. Add comments for complex operations
        6. Prevent SQL injection attacks
        7. Handle NULLs with COALESCE
        8. Include LIMIT unless explicitly disabled
        9. For date queries:
            - Use CURRENT_DATE for today's date
            - Use CURRENT_TIMESTAMP for current time
            - For relative dates (e.g., last month), use date arithmetic
            - Format dates consistently (YYYY-MM-DD)
            - Current year is {current_year}

        Database Schema:
        {schema}

        Response Format:
            1. Valid SQL Query
            2. Execution Result Summary
            3. Step-by-Step Explanation
            4. Optimization Tips
            5. Alternative Approaches
        
        """

    def _init_agent(self):
        self.toolkit = SQLDatabaseToolkit(db=self.db, llm=self.llm)
        self.memory = ConversationBufferMemory()

        self.agent = create_sql_agent(
            llm=self.llm,
            toolkit=self.toolkit,
            agent_type=AgentType.OPENAI_FUNCTIONS,
            verbose=True,
            memory=self.memory,
            prefix=self._get_prompt_prefix(),  # ✅ Now this method exists before being used
            format_instructions=self._get_format_instructions(),
            handle_parsing_errors=True,
            max_iterations=5,
            early_stopping_method="generate"
        )

    def _get_format_instructions(self) -> str:
        return """
        Always respond with markdown containing:
        - **SQL Query** (syntax-validated)
        - **Result** (summarized output)
        - **Explanation** (detailed reasoning)
        - **Optimizations** (index/partition suggestions)
        - **Alternatives** (different methodologies)
        """

    def _get_schema_info(self) -> str:
        schema = []
        for table in self.inspector.get_table_names():
            columns = [f"{col['name']} ({col['type']})"
                       for col in self.inspector.get_columns(table)]
            schema.append(f"Table {table}:\n  " + "\n  ".join(columns))
        return "\n\n".join(schema)

    def run_ui(self):
        st.set_page_config(page_title="AI SQL Agent", layout="wide")
        st.title("🛠️ AI-Powered SQL Assistant")

        # Initialize chat history if not exists
        if "messages" not in st.session_state:
            st.session_state.messages = []

        # Sidebar with schema info and chat history
        with st.sidebar:
            st.header("📖 Database Schema")
            for table in self.inspector.get_table_names():
                with st.expander(f"📑 {table}"):
                    cols = [
                        f"{col['name']} ({col['type']})"
                        for col in self.inspector.get_columns(table)
                    ]
                    st.code("\n".join(cols))

            st.divider()
            st.header("📝 Chat History")
            # Display chat history in sidebar
            for msg in st.session_state.messages:
                role = "💬 User" if msg["role"] == "user" else "🤖 Assistant"
                with st.expander(f"{role}: {msg['content'][:50]}...", expanded=False):
                    st.markdown(msg["content"])

        # Main chat interface
        # Display all previous messages in the main area
        for msg in st.session_state.messages:
            with st.chat_message(msg["role"]):
                st.markdown(msg["content"])

        # Chat input
        if prompt := st.chat_input("Ask a data question (e.g., 'Show all active products'):"):
            # Add user message to chat history
            st.session_state.messages.append({"role": "user", "content": prompt})
            
            # Display user message immediately
            with st.chat_message("user"):
                st.markdown(prompt)

            # Create a placeholder for the assistant's response
            with st.chat_message("assistant"):
                try:
                    st_callback = StreamlitCallbackHandler(st.container())
                    response = self.agent.run(prompt, callbacks=[st_callback])
                    processed = self._process_response(response)

                    # Display SQL
                    if processed["sql"]:
                        with st.expander("🔍 Generated SQL"):
                            st.code(processed["sql"], language="sql")

                    # Display Data
                    if processed["data"] is not None:
                        st.subheader("📊 Results Preview")
                        st.dataframe(processed["data"].head(10), use_container_width=True)

                    # Display Analysis
                    if processed["explanation"]:
                        st.subheader("📝 Explanation")
                        st.markdown(processed["explanation"])

                    # Add assistant's response to chat history with the summary
                    st.session_state.messages.append({
                        "role": "assistant",
                        "content": processed["summary"]
                    })

                    # Force a rerun to update the display while maintaining history
                    st.rerun()

                except Exception as e:
                    error_message = f"❌ Error: {str(e)}"
                    st.error(error_message)
                    st.session_state.messages.append({
                        "role": "assistant",
                        "content": error_message
                    })
                    # Force a rerun to update the display while maintaining history
                    st.rerun()

    def _process_response(self, response: str) -> dict:
        result = {
            "sql": "",
            "data": None,
            "explanation": "",
            "optimizations": "",
            "alternatives": "",
            "summary": ""  # Add summary field
        }

        try:
            # Handle both string and tuple responses
            if isinstance(response, tuple):
                response = response[0]

            # Extract SQL Query
            if "```sql" in response:
                sql_part = response.split("```sql")[1].split("```")[0].strip()
                result["sql"] = sqlparse.format(sql_part, reindent=True)

                # Execute SQL directly
                try:
                    result["data"] = pd.read_sql_query(result["sql"], self.engine)
                    
                    # Format the data into a clean text explanation
                    if result["data"] is not None and not result["data"].empty:
                        explanation_parts = []
                        explanation_parts.append(f"Found {len(result['data'])} records:")
                        
                        # Format each row in a clean text way
                        for _, row in result["data"].iterrows():
                            record_info = []
                            for col in result["data"].columns:
                                if col != 'id':  # Skip ID in the display
                                    value = row[col]
                                    if pd.isna(value):
                                        value = 'None'
                                    record_info.append(f"{value}")
                            explanation_parts.append(f"• {' - '.join(record_info)}")
                        
                        result["explanation"] = "\n".join(explanation_parts)
                except Exception as e:
                    st.error(f"SQL Execution Error: {str(e)}")
                    result["error"] = f"SQL Execution Error: {str(e)}"

            # Extract other response parts
            sections = ["Result", "Explanation", "Optimizations", "Alternatives", "Execution Result Summary"]
            current_section = None

            for line in response.split("\n"):
                if line.startswith("**"):
                    current_section = line.strip("*: ").lower()
                elif current_section and current_section in result:
                    result[current_section] += line + "\n"
                elif "Execution Result Summary" in line:
                    current_section = "summary"
                elif current_section == "summary":
                    result["summary"] += line + "\n"

            # If no summary was found, use the explanation as summary
            if not result["summary"].strip():
                result["summary"] = result["explanation"]

            return result

        except Exception as e:
            return {"error": f"Response processing error: {str(e)}"}

if __name__ == "__main__":
    app = SQLAgentApp()  # ✅ Create an instance of the class
    app.run_ui()         # ✅ Call the method to start the UI

Tech GPT

Search This Blog