Thursday, June 19, 2025

Building an AI-Powered SQL Assistant: A Beginner's Guide to the Code

 In this blog, we'll break down the provided Python code that creates a streamlit-based AI-powered SQL assistant. This assistant enables users to interact with a database, write optimized SQL queries, and retrieve results—all powered by OpenAI's GPT-4-turbo model. By the end of this blog, you'll understand how the code works and how you can use or modify it for your own projects.


Introduction

The code builds a user-friendly web app that:

  1. Connects to a PostgreSQL database.
  2. Uses OpenAI's GPT-4 model to generate SQL queries.
  3. Executes the queries on the database.
  4. Displays the results, SQL query, and explanations.

The app is beginner-friendly and uses Streamlit for the interface, SQLAlchemy for database interaction, and LangChain for managing the AI functionality.


Breaking Down the Code

1. Setup and Dependencies

Before diving into the code, the following libraries are used:

  • Streamlit: For creating the web-based user interface.
  • SQLAlchemy: For managing database connections.
  • LangChain: For managing interactions with the OpenAI language model.
  • dotenv: For loading environment variables securely.
  • Pandas: For handling query results as dataframes.
  • sqlparse and sqlglot: For validating and formatting SQL queries.

Install these dependencies using:

pip install streamlit sqlalchemy langchain pandas python-dotenv sqlparse sqlglot

2. Class Overview: SQLAgentApp

The code is organized into a class called SQLAgentApp. This class encapsulates the entire application logic, making the code modular and reusable.


3. Environment Setup

The _init_environment method ensures that the OpenAI API key is loaded from environment variables using the dotenv library.

def _init_environment(self):
    self.openai_key = os.getenv("OPENAI_API_KEY")
    if not self.openai_key:
        st.error("OPENAI_API_KEY not found in environment variables")
        st.stop()
  • If the API key is missing, the app stops and shows an error message.
  • This ensures secure handling of sensitive API keys.

4. Database Initialization

The _init_database method connects to a PostgreSQL database using SQLAlchemy. It retrieves database credentials from environment variables or uses default values.

def _init_database(self):
    db_config = {
        "user": os.getenv("PG_USER", "postgres"),
        "password": os.getenv("PG_PASSWORD", "postgres_password"),
        "host": os.getenv("PG_HOST", "localhost"),
        "port": os.getenv("PG_PORT", "5432"),
        "database": os.getenv("PG_DATABASE", "postgres_db")
    }

    self.DATABASE_URL = (
        f"postgresql+psycopg2://{db_config['user']}:{db_config['password']}"
        f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
    )
  • The code uses create_engine to establish a connection to the database.
  • If the connection fails, an error message is displayed using Streamlit, and the app stops.

5. Initializing the LLM

The _init_llm method creates an instance of OpenAI's GPT-4-turbo model using LangChain.

def _init_llm(self):
    self.llm = ChatOpenAI(
        model_name="gpt-4-turbo-preview",
        temperature=0,
        streaming=True,
        api_key=self.openai_key
    )
  • temperature=0 ensures deterministic responses.
  • Streaming is enabled for real-time responses in Streamlit.

6. Prompt Engineering

The _get_prompt_prefix method constructs a detailed prompt for the AI model. It includes:

  • The database schema (tables and columns).
  • Clear rules for generating SQL queries.
  • Formatting guidelines for the response.
def _get_prompt_prefix(self) -> str:
    schema = self._get_schema_info()
    current_year = datetime.now().year
    return f"""
    You are a world-class SQL expert with 15+ years of experience. ...
    Database Schema:
    {schema}
    """
  • The schema is dynamically retrieved using SQLAlchemy's inspect module.
  • The prompt ensures the AI generates PostgreSQL-compliant SQL.

7. Agent Initialization

The _init_agent method creates a LangChain SQL agent.

self.agent = create_sql_agent(
    llm=self.llm,
    toolkit=self.toolkit,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    memory=self.memory,
    prefix=self._get_prompt_prefix(),
    format_instructions=self._get_format_instructions(),
    handle_parsing_errors=True,
    max_iterations=5,
    early_stopping_method="generate"
)
  • LangChain SQL Agent: Combines the AI model with SQL tools to generate and execute queries.
  • ConversationBufferMemory: Stores the chat history for better context.
  • Format Instructions: Ensures the AI provides clear, structured responses.

8. User Interface

The run_ui method builds the entire web interface using Streamlit.

Sidebar

The sidebar displays:

  1. Database Schema: A list of tables and columns retrieved dynamically.
  2. Chat History: Previous user and AI messages.

Main Chat Interface

The main interface handles:

  1. User Input: Users can ask questions about the database.
  2. AI Responses: The AI responds with SQL queries, results, and explanations.
if prompt := st.chat_input("Ask a data question..."):
    st.session_state.messages.append({"role": "user", "content": prompt})
    
    # AI generates response
    response = self.agent.run(prompt)
  • User input is sent to the AI agent.
  • The response includes the SQL query, results, and analysis.

9. SQL Execution and Processing

The _process_response method extracts and executes the SQL query generated by the AI.

if "```sql" in response:
    sql_part = response.split("```sql")[1].split("```")[0].strip()
    result["sql"] = sqlparse.format(sql_part, reindent=True)
    result["data"] = pd.read_sql_query(result["sql"], self.engine)
  • SQL Parsing: Extracts the SQL code from the response.
  • Execution: Runs the SQL query on the database using Pandas.
  • Result Formatting: Processes the query result into a readable format.

10. Error Handling

The code includes robust error handling:

  • If the database connection fails, the app stops with an error message.
  • SQL execution errors are caught and displayed to the user.

11. Running the App

Finally, the app is launched using:

if __name__ == "__main__":
    app = SQLAgentApp()
    app.run_ui()

How the App Works

Step-by-Step Workflow

  1. The user inputs a question, such as "Show all active products."
  2. The AI generates a PostgreSQL query based on the database schema.
  3. The query is executed, and the results are displayed in a table.
  4. The AI provides an explanation of the query and optimization tips.

Key Features

  • Dynamic Schema Retrieval: Automatically fetches database tables and columns.
  • AI-Generated SQL: Ensures accurate and optimized queries.
  • Real-Time Execution: Runs queries instantly and displays results.
  • Detailed Analysis: Explains the results and suggests optimizations.

Conclusion

This code demonstrates how to build an AI-powered SQL assistant that integrates database querying, AI, and a user-friendly interface. By following this guide, you can adapt the application to your own database and enhance it with additional features.

import os
import re
import sqlparse
from dotenv import load_dotenv
from typing import List, Dict, Any
from sqlalchemy import create_engine, inspect
from sqlglot import transpile # FIXED IMPORT
import pandas as pd
import streamlit as st
from datetime import datetime

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.agents import create_sql_agent, AgentType
from langchain.callbacks import StreamlitCallbackHandler
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
import sqlparse

# Load environment variables
load_dotenv()
os.environ['OPENAI_API_KEY'] = "" # Replace with your OpenAI key
class SQLAgentApp:
def __init__(self):
self._init_environment()
self._init_database()
self._init_llm()
self._init_agent()

def _init_environment(self):
self.openai_key = os.getenv("OPENAI_API_KEY")
if not self.openai_key:
st.error("OPENAI_API_KEY not found in environment variables")
st.stop()

def _init_database(self):
# Initialize database connection
db_config = {
"user": os.getenv("PG_USER", "postgres"),
"password": os.getenv("PG_PASSWORD", "postgres_password"),
"host": os.getenv("PG_HOST", "localhost"),
"port": os.getenv("PG_PORT", "5432"),
"database": os.getenv("PG_DATABASE", "postgres_db")
}

self.DATABASE_URL = (
f"postgresql+psycopg2://{db_config['user']}:{db_config['password']}"
f"@{db_config['host']}:{db_config['port']}/{db_config['database']}"
)

try:
self.engine = create_engine(self.DATABASE_URL)
self.db = SQLDatabase(self.engine)
self.inspector = inspect(self.engine)
except Exception as e:
st.error(f"Database connection failed: {str(e)}")
st.stop()

def _init_llm(self):
self.llm = ChatOpenAI(
model_name="gpt-4-turbo-preview",
temperature=0,
streaming=True,
api_key=self.openai_key
)

def _get_prompt_prefix(self) -> str:
schema = self._get_schema_info()
current_year = datetime.now().year
return f"""
You are a world-class SQL expert with 15+ years of experience. Follow these rules:
1. Generate PostgreSQL-compliant SQL
2. Validate queries with SQLGlot before execution
3. Use CTEs for complex logic
4. Optimize for query performance
5. Add comments for complex operations
6. Prevent SQL injection attacks
7. Handle NULLs with COALESCE
8. Include LIMIT unless explicitly disabled
9. For date queries:
- Use CURRENT_DATE for today's date
- Use CURRENT_TIMESTAMP for current time
- For relative dates (e.g., last month), use date arithmetic
- Format dates consistently (YYYY-MM-DD)
- Current year is {current_year}

Database Schema:
{schema}

Response Format:
1. Valid SQL Query
2. Execution Result Summary
3. Step-by-Step Explanation
4. Optimization Tips
5. Alternative Approaches

"""

def _init_agent(self):
self.toolkit = SQLDatabaseToolkit(db=self.db, llm=self.llm)
self.memory = ConversationBufferMemory()

self.agent = create_sql_agent(
llm=self.llm,
toolkit=self.toolkit,
agent_type=AgentType.OPENAI_FUNCTIONS,
verbose=True,
memory=self.memory,
prefix=self._get_prompt_prefix(), # Now this method exists before being used
format_instructions=self._get_format_instructions(),
handle_parsing_errors=True,
max_iterations=5,
early_stopping_method="generate"
)

def _get_format_instructions(self) -> str:
return """
Always respond with markdown containing:
- **SQL Query** (syntax-validated)
- **Result** (summarized output)
- **Explanation** (detailed reasoning)
- **Optimizations** (index/partition suggestions)
- **Alternatives** (different methodologies)
"""

def _get_schema_info(self) -> str:
schema = []
for table in self.inspector.get_table_names():
columns = [f"{col['name']} ({col['type']})"
for col in self.inspector.get_columns(table)]
schema.append(f"Table {table}:\n " + "\n ".join(columns))
return "\n\n".join(schema)

def run_ui(self):
st.set_page_config(page_title="AI SQL Agent", layout="wide")
st.title("🛠️ AI-Powered SQL Assistant")

# Initialize chat history if not exists
if "messages" not in st.session_state:
st.session_state.messages = []

# Sidebar with schema info and chat history
with st.sidebar:
st.header("📖 Database Schema")
for table in self.inspector.get_table_names():
with st.expander(f"📑 {table}"):
cols = [
f"{col['name']} ({col['type']})"
for col in self.inspector.get_columns(table)
]
st.code("\n".join(cols))

st.divider()
st.header("📝 Chat History")
# Display chat history in sidebar
for msg in st.session_state.messages:
role = "💬 User" if msg["role"] == "user" else "🤖 Assistant"
with st.expander(f"{role}: {msg['content'][:50]}...", expanded=False):
st.markdown(msg["content"])

# Main chat interface
# Display all previous messages in the main area
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.markdown(msg["content"])

# Chat input
if prompt := st.chat_input("Ask a data question (e.g., 'Show all active products'):"):
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": prompt})

# Display user message immediately
with st.chat_message("user"):
st.markdown(prompt)

# Create a placeholder for the assistant's response
with st.chat_message("assistant"):
try:
st_callback = StreamlitCallbackHandler(st.container())
response = self.agent.run(prompt, callbacks=[st_callback])
processed = self._process_response(response)

# Display SQL
if processed["sql"]:
with st.expander("🔍 Generated SQL"):
st.code(processed["sql"], language="sql")

# Display Data
if processed["data"] is not None:
st.subheader("📊 Results Preview")
st.dataframe(processed["data"].head(10), use_container_width=True)

# Display Analysis
if processed["explanation"]:
st.subheader("📝 Explanation")
st.markdown(processed["explanation"])

# Add assistant's response to chat history with the summary
st.session_state.messages.append({
"role": "assistant",
"content": processed["summary"]
})

# Force a rerun to update the display while maintaining history
st.rerun()

except Exception as e:
error_message = f" Error: {str(e)}"
st.error(error_message)
st.session_state.messages.append({
"role": "assistant",
"content": error_message
})
# Force a rerun to update the display while maintaining history
st.rerun()

def _process_response(self, response: str) -> dict:
result = {
"sql": "",
"data": None,
"explanation": "",
"optimizations": "",
"alternatives": "",
"summary": "" # Add summary field
}

try:
# Handle both string and tuple responses
if isinstance(response, tuple):
response = response[0]

# Extract SQL Query
if "```sql" in response:
sql_part = response.split("```sql")[1].split("```")[0].strip()
result["sql"] = sqlparse.format(sql_part, reindent=True)

# Execute SQL directly
try:
result["data"] = pd.read_sql_query(result["sql"], self.engine)

# Format the data into a clean text explanation
if result["data"] is not None and not result["data"].empty:
explanation_parts = []
explanation_parts.append(f"Found {len(result['data'])} records:")

# Format each row in a clean text way
for _, row in result["data"].iterrows():
record_info = []
for col in result["data"].columns:
if col != 'id': # Skip ID in the display
value = row[col]
if pd.isna(value):
value = 'None'
record_info.append(f"{value}")
explanation_parts.append(f"• {' - '.join(record_info)}")

result["explanation"] = "\n".join(explanation_parts)
except Exception as e:
st.error(f"SQL Execution Error: {str(e)}")
result["error"] = f"SQL Execution Error: {str(e)}"

# Extract other response parts
sections = ["Result", "Explanation", "Optimizations", "Alternatives", "Execution Result Summary"]
current_section = None

for line in response.split("\n"):
if line.startswith("**"):
current_section = line.strip("*: ").lower()
elif current_section and current_section in result:
result[current_section] += line + "\n"
elif "Execution Result Summary" in line:
current_section = "summary"
elif current_section == "summary":
result["summary"] += line + "\n"

# If no summary was found, use the explanation as summary
if not result["summary"].strip():
result["summary"] = result["explanation"]

return result

except Exception as e:
return {"error": f"Response processing error: {str(e)}"}

if __name__ == "__main__":
app = SQLAgentApp() # Create an instance of the class
app.run_ui() # Call the method to start the UI


Wednesday, June 18, 2025

Optimizing LLM Queries for CSV Files to Minimize Token Usage: A Beginner's Guide

When working with large CSV files and querying them using a Language Model (LLM), optimizing your approach to minimize token usage is crucial. This helps reduce costs, improve performance, and make your system more efficient. Here’s a beginner-friendly guide to help you understand how to achieve this.


What Are Tokens, and Why Do They Matter?

Tokens are the building blocks of text that LLMs process. A single word like "cat" or punctuation like "." counts as a token. Longer texts mean more tokens, which can lead to higher costs and slower query responses. By optimizing how you query CSV data, you can significantly reduce token usage.


Key Strategies to Optimize LLM Queries for CSV Files

1. Preprocess and Filter Data

Before sending data to the LLM, filter and preprocess it to retrieve only the relevant rows and columns. This minimizes the size of the input text.

How to Do It:

  • Use Python or database tools to preprocess the CSV file.
  • Filter for only the rows and columns necessary for your query.
    import pandas as pd
    
    # Load CSV file
    df = pd.read_csv("data.csv")
    
    # Filter relevant data
    filtered_df = df[df["Category"] == "AI"]
    filtered_df = filtered_df[["Name", "Score"]]
    
    # Save filtered data to a smaller CSV file
    filtered_df.to_csv("filtered_data.csv", index=False)
    

Benefit: Instead of sending an entire CSV file, you send only the required subset of data.


2. Summarize Data

Use aggregation or summarization techniques to condense the data before passing it to the LLM.

Example Use Case:
Instead of sending 1,000 rows of sales data, aggregate it at a higher level:

  • Compute totals, averages, or other metrics.
  • Pass only the summary (e.g., "Total sales for 2025: $1,000,000").
summary = df.groupby("Year")["Sales"].sum()
summary_text = f"Total sales by year: {summary.to_dict()}"

Benefit: A single line of summary text replaces thousands of rows, saving tokens.


3. Use Metadata for Efficient Queries

Add metadata to your CSV file, such as tags or categories, to make filtering easier and faster.

Example:
If your CSV contains transaction logs, include metadata like:

  • Date
  • Category
  • Priority

Instead of passing raw logs, query the metadata first:

  • Query: "Show me high-priority transactions from June 2025."
  • Result: Filtered data with only relevant rows.

4. Chunk Large Data

If your CSV file is too large to process at once, split it into smaller chunks and query the LLM sequentially.

How to Do It:

  • Divide the file into multiple parts.
  • Process each chunk individually.
  • Combine the results.
# Split CSV into chunks
chunk_size = 100  # Number of rows per chunk
for i, chunk in enumerate(pd.read_csv("large_file.csv", chunksize=chunk_size)):
    chunk.to_csv(f"chunk_{i}.csv", index=False)

Benefit: Smaller chunks fit within the LLM’s context window, reducing token usage.


5. Use Embedding-Based Search

Convert your CSV data into embeddings (vector representations). Store these embeddings in a vector database and perform similarity searches to retrieve the most relevant rows for your query.

Tools to Use:

  • Use libraries like sentence-transformers to generate embeddings.
  • Store them in vector databases like Pinecone or Weaviate.

Example:

  • Query: "Find rows similar to 'AI projects with a score above 90'."
  • Result: Only the top 5–10 most relevant rows are sent to the LLM.

6. Pre-Format Data

Format your data into a compact structure, like JSON or a short table, before sending it to the LLM.

Example:
Instead of sending raw CSV rows:

Name, Score, Category
Alice, 95, AI
Bob, 88, Data Science

Send this:

[{"Name": "Alice", "Score": 95, "Category": "AI"}]

Benefit: Structured data reduces unnecessary tokens and improves query clarity.


7. Leverage Caching

Cache the results of frequently used queries. If a query has already been processed, return the cached result instead of re-querying the LLM.

How to Implement:

  • Use a key-value store (e.g., Redis) to save query results.
  • Check the cache before querying the LLM.

Example:

  • Query: "Summarize sales data for 2025."
  • Cache lookup: Return the previously computed summary if available.

8. Ask Specific Questions

Frame your queries to be as specific as possible. Avoid open-ended or vague queries, as they may require processing more data.

Example:

  • Instead of: "Tell me about the CSV file."
  • Ask: "What are the top 5 products by sales in 2025?"

A Practical Workflow for CSV Optimization

  1. Load and Preprocess Data

    • Use Python or database tools to filter and preprocess the CSV.
  2. Summarize or Chunk Data

    • Aggregate or split the data into smaller sets.
  3. Query Efficiently

    • Use metadata or embeddings to retrieve only relevant rows.
  4. Send Compact Data to LLM

    • Format the filtered data into a compact structure.
  5. Cache Results

    • Store frequently queried results to avoid redundant token usage.

Example Query Optimization

Scenario: You have a CSV file with 1,000 rows of sales data. You want to know the top 3 products by sales.

Non-Optimized Query:

"Here is my CSV file: [entire 1,000 rows]. What are the top 3 products by sales?"

Optimized Query:

  1. Preprocess the CSV:
    top_products = df.groupby("Product")["Sales"].sum().nlargest(3)
    
  2. Send the result to the LLM:
    "These are the top 3 products by sales: {'Product A': 500, 'Product B': 450, 'Product C': 400}. Provide insights on these."
    

Conclusion

By filtering, summarizing, and structuring your data before querying an LLM, you can significantly reduce token usage and costs. Start small, experiment with these techniques, and gradually build an efficient workflow for handling large CSV files. Happy optimizing! 🚀

Friday, June 13, 2025

Enhancing Prompt Engineering Accuracy in Software Requirements Processing with Multi-Agent Systems and Multiple LLMs

When processing software requirements or testing documents using large language models (LLMs), ensuring the accuracy and reliability of the results is a critical challenge. One effective approach is using a multi-agent system that leverages multiple LLMs to cross-validate outputs, compare results, and improve overall accuracy. This approach reduces reliance on a single model and ensures robustness in processing, especially for tasks like test case generation, requirement validation, and data summarization.

This blog explores how to use multi-agent systems with different LLMs to process software requirements, compare results, and measure accuracy.


What is a Multi-Agent System?

A multi-agent system involves multiple independent AI agents (in this case, LLMs) working together to:

  1. Perform the same task independently.
  2. Cross-validate or refine each other’s outputs.
  3. Provide diverse perspectives to improve the quality and reliability of the final result.

In our case, the agents will be different LLMs (e.g., GPT-4, Claude, Cohere, or Google Bard).


Use Case Overview

We’ll process a software requirements CSV file to:

  1. Extract high-priority functional requirements.
  2. Generate test cases for these requirements.
  3. Use multiple LLMs to process the data and compare their outputs.

Finally, we’ll measure accuracy by comparing the results and identifying discrepancies.


Step-by-Step Guide

Step 1: Setup and Prepare the Data

We’ll begin by preparing the software requirements data in a CSV format.

Example CSV File (requirements.csv):

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

Load the CSV Data in Python

import pandas as pd

# Load the CSV file
csv_file = 'requirements.csv'
df = pd.read_csv(csv_file)

# Convert the data to JSON format
json_data = df.to_json(orient='records')
print("JSON Data for AI Query:")
print(json_data)

Step 2: Define the Prompt

We’ll use the following prompt to query multiple LLMs. The task involves extracting high-priority functional requirements and generating test cases.

Prompt Template:

Here is the software requirements data in JSON format:

{json_data}

Task:
1. Extract all "High Priority" functional requirements.
2. Generate 2 test cases for each extracted requirement.
3. Return the output in a structured JSON format.

Step 3: Query Multiple LLMs

We’ll use Python to query multiple LLMs (e.g., GPT-4, Claude, Cohere) simultaneously. Here’s how you can set up the multi-agent system.

Install Required Libraries

pip install openai anthropic cohere

Python Code for Multi-Agent Queries

import openai
import requests
import json

# Set API Keys for different LLMs
openai.api_key = 'your-openai-api-key'
anthropic_api_key = 'your-anthropic-api-key'
cohere_api_key = 'your-cohere-api-key'

# Define the prompt
prompt = f"""
Here is the software requirements data in JSON format:

{json_data}

Task:
1. Extract all "High Priority" functional requirements.
2. Generate 2 test cases for each extracted requirement.
3. Return the output in a structured JSON format.
"""

# Query GPT (OpenAI)
def query_gpt(prompt):
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=500,
        temperature=0
    )
    return response.choices[0].text.strip()

# Query Claude (Anthropic)
def query_claude(prompt):
    url = "https://api.anthropic.com/v1/complete"
    headers = {
        "x-api-key": anthropic_api_key,
        "Content-Type": "application/json"
    }
    data = {
        "prompt": prompt,
        "model": "claude-v1",
        "max_tokens_to_sample": 500,
        "temperature": 0
    }
    response = requests.post(url, headers=headers, json=data)
    return response.json()['completion'].strip()

# Query Cohere
def query_cohere(prompt):
    url = "https://api.cohere.ai/generate"
    headers = {"Authorization": f"Bearer {cohere_api_key}"}
    data = {
        "model": "command-xlarge-nightly",
        "prompt": prompt,
        "max_tokens": 500,
        "temperature": 0
    }
    response = requests.post(url, headers=headers, json=data)
    return response.json()['generations'][0]['text'].strip()

# Execute queries
gpt_output = query_gpt(prompt)
claude_output = query_claude(prompt)
cohere_output = query_cohere(prompt)

# Display outputs
print("GPT Output:")
print(gpt_output)

print("\nClaude Output:")
print(claude_output)

print("\nCohere Output:")
print(cohere_output)

Step 4: Compare and Validate Outputs

To measure the accuracy of the outputs, we’ll compare the results from all LLMs and identify discrepancies. This can be done by:

  1. Manual Comparison: Reviewing outputs side-by-side.
  2. Automated Comparison: Using Python to check for differences.

Automated Comparison Code

# Parse outputs into JSON
gpt_data = json.loads(gpt_output)
claude_data = json.loads(claude_output)
cohere_data = json.loads(cohere_output)

# Compare outputs
def compare_outputs(data1, data2):
    discrepancies = []
    for key in data1.keys():
        if data1[key] != data2[key]:
            discrepancies.append({
                "Key": key,
                "Data1": data1[key],
                "Data2": data2[key]
            })
    return discrepancies

# Compare GPT vs Claude
discrepancies_gpt_claude = compare_outputs(gpt_data, claude_data)

# Compare GPT vs Cohere
discrepancies_gpt_cohere = compare_outputs(gpt_data, cohere_data)

# Display discrepancies
print("Discrepancies between GPT and Claude:")
print(discrepancies_gpt_claude)

print("\nDiscrepancies between GPT and Cohere:")
print(discrepancies_gpt_cohere)

Step 5: Measure Accuracy

To evaluate the accuracy of each model:

  1. Define a ground truth (a set of manually verified correct outputs).
  2. Compare each model’s output against the ground truth using metrics like:
    • Precision: Correct results out of all results generated by the model.
    • Recall: Correct results out of all possible correct results.
    • F1-Score: Harmonic mean of precision and recall.

Accuracy Calculation Code

from sklearn.metrics import precision_score, recall_score, f1_score

def calculate_accuracy(predicted, ground_truth):
    # Convert results to sets for comparison
    predicted_set = set(predicted)
    ground_truth_set = set(ground_truth)
    
    # Calculate metrics
    tp = len(predicted_set & ground_truth_set)  # True positives
    fp = len(predicted_set - ground_truth_set)  # False positives
    fn = len(ground_truth_set - predicted_set)  # False negatives
    
    precision = tp / (tp + fp) if tp + fp > 0 else 0
    recall = tp / (tp + fn) if tp + fn > 0 else 0
    f1 = 2 * (precision * recall) / (precision + recall) if precision + recall > 0 else 0
    
    return {"Precision": precision, "Recall": recall, "F1-Score": f1}

# Example ground truth and predicted outputs
ground_truth = [
    "Users must be able to register with valid credentials.",
    "Payment can be processed securely via credit card."
]

predicted_gpt = [
    "Users must be able to register with valid credentials.",
    "Payment can be processed securely via credit card."
]

predicted_claude = [
    "Users must be able to register with valid credentials.",
    "Payments can be processed using PayPal securely."
]

# Calculate accuracy for GPT
accuracy_gpt = calculate_accuracy(predicted_gpt, ground_truth)
print("GPT Accuracy:", accuracy_gpt)

# Calculate accuracy for Claude
accuracy_claude = calculate_accuracy(predicted_claude, ground_truth)
print("Claude Accuracy:", accuracy_claude)

Benefits of Multi-Agent Systems

  1. Increased Reliability: Cross-validating outputs from multiple models ensures higher accuracy.
  2. Error Detection: Discrepancies highlight potential errors or ambiguities in the data or model behavior.
  3. Diverse Perspectives: Different LLMs may interpret prompts differently, providing complementary insights.

Conclusion

Using a multi-agent system with multiple LLMs (e.g., GPT, Claude, Cohere), you can improve the accuracy and reliability of software requirement processing tasks. By comparing outputs, identifying discrepancies, and calculating metrics like precision and recall, you can ensure robust results for tasks like test case generation, requirement extraction, and validation.

This approach is ideal for critical projects where accuracy is paramount, and errors can have significant consequences. With Python and Generative AI, you can build a scalable and reliable pipeline for processing technical documents. 🚀

Processing Software Requirements or Testing Documents in CSV Format with Python and Generative AI

 Software requirements or testing documents often contain structured data, and querying or processing these documents effectively can make tasks like test case generation, requirement analysis, and data summarization much easier. In this blog, we’ll explore how to use Python and Generative AI to process software requirements or testing documents stored in CSV files.

We’ll cover:

  1. Reading and preparing the CSV file.
  2. Writing queries for Generative AI using prompt engineering.
  3. Using Generative AI to extract, process, and generate additional data based on the CSV.
  4. Saving results back to a CSV file.

1. Understanding the Example CSV

Here’s an example of a software requirements CSV file (requirements.csv):

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

2. Python Code to Read and Process the CSV

We’ll use Python to:

  1. Read the CSV file into a DataFrame.
  2. Convert it to JSON for Generative AI queries.
  3. Write prompts to query the data.
  4. Save the AI-generated results back to a CSV.

Setup

Install Required Libraries:

pip install pandas openai

Step 1: Read the CSV File

We’ll use pandas to read the CSV file into a DataFrame and convert it into JSON format for use with Generative AI.

import pandas as pd

# Read the CSV file
csv_file = 'requirements.csv'
df = pd.read_csv(csv_file)

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Convert the DataFrame into JSON for AI queries
json_data = df.to_json(orient='records')
print("\nData in JSON format:")
print(json_data)

Output:

Original DataFrame:
   ID                                          Requirement           Type  \
0   1  Users must be able to register and log in using...    Functional   
1   2  Search functionality must return relevant resul...    Functional   
2   3     The platform must handle 500 concurrent users  Non-Functional   
3   4  Payment processing must support credit cards an...    Functional   
4   5  Daily backups of all data must be performed aut...  Non-Functional   

  Priority    Status  
0     High  Approved  
1   Medium   Pending  
2     High  Approved  
3     High  Approved  
4   Medium   Pending  

Data in JSON format:
[{"ID":1,"Requirement":"Users must be able to register and log in using their email and password","Type":"Functional","Priority":"High","Status":"Approved"}, ...]

Step 2: Write a Query Using Prompt Engineering

To interact with the JSON data, we’ll craft a prompt for Generative AI. For this example, we’ll filter all high-priority functional requirements and generate test cases for each.

Prepare the Prompt

import openai

# Set your OpenAI API key
openai.api_key = 'your-openai-api-key'

# Define the prompt
prompt = f"""
Here is the software requirements data in JSON format:

{json_data}

Task:
1. Extract all "High Priority" functional requirements.
2. Generate 2 test cases for each extracted requirement.
3. Return the output in a structured JSON format.
"""

# Display the prompt
print("Generated Prompt:")
print(prompt)

Step 3: Query the Generative AI Model

We’ll send the prompt to OpenAI’s API (e.g., GPT-4) and process the output.

# Query the OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",  # Use the appropriate engine
    prompt=prompt,
    max_tokens=500,
    temperature=0
)

# Extract the response text
ai_output = response.choices[0].text.strip()

# Display AI's Output
print("\nAI Output:")
print(ai_output)

Expected AI Output:

{
  "High Priority Requirements": [
    {
      "Requirement": "Users must be able to register and log in using their email and password",
      "Test Cases": [
        "Verify that the user can register with a valid email and password.",
        "Verify that the user cannot register with an invalid email format."
      ]
    },
    {
      "Requirement": "Payment processing must support credit cards and PayPal securely",
      "Test Cases": [
        "Verify that payment can be processed securely via credit card.",
        "Verify that payment can be processed securely via PayPal."
      ]
    }
  ]
}

Step 4: Save the Results Back to a CSV

Now, we’ll parse the AI’s JSON output and save the results in a structured CSV file.

Parse and Save the Results

import json

# Parse the AI output (assumes it is valid JSON)
output_data = json.loads(ai_output)

# Flatten the data for CSV storage
flattened_data = []
for item in output_data['High Priority Requirements']:
    for test_case in item['Test Cases']:
        flattened_data.append({
            "Requirement": item['Requirement'],
            "Test Case": test_case
        })

# Convert to a DataFrame
output_df = pd.DataFrame(flattened_data)

# Save to a new CSV file
output_file = 'high_priority_test_cases.csv'
output_df.to_csv(output_file, index=False)

print(f"\nGenerated test cases saved to {output_file}")

Generated CSV (high_priority_test_cases.csv):

Requirement,Test Case
Users must be able to register and log in using their email and password,Verify that the user can register with a valid email and password.
Users must be able to register and log in using their email and password,Verify that the user cannot register with an invalid email format.
Payment processing must support credit cards and PayPal securely,Verify that payment can be processed securely via credit card.
Payment processing must support credit cards and PayPal securely,Verify that payment can be processed securely via PayPal.

Complete Python Script

Here’s the complete code for the workflow:

import pandas as pd
import openai
import json

# Set OpenAI API key
openai.api_key = 'your-openai-api-key'

# Step 1: Read the CSV file
csv_file = 'requirements.csv'
df = pd.read_csv(csv_file)
json_data = df.to_json(orient='records')

# Step 2: Define the prompt
prompt = f"""
Here is the software requirements data in JSON format:

{json_data}

Task:
1. Extract all "High Priority" functional requirements.
2. Generate 2 test cases for each extracted requirement.
3. Return the output in a structured JSON format.
"""

# Step 3: Query the OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=500,
    temperature=0
)

# Extract and parse the AI output
ai_output = response.choices[0].text.strip()
output_data = json.loads(ai_output)

# Step 4: Flatten the data for saving to CSV
flattened_data = []
for item in output_data['High Priority Requirements']:
    for test_case in item['Test Cases']:
        flattened_data.append({
            "Requirement": item['Requirement'],
            "Test Case": test_case
        })

# Convert to DataFrame and save to CSV
output_df = pd.DataFrame(flattened_data)
output_file = 'high_priority_test_cases.csv'
output_df.to_csv(output_file, index=False)

print(f"Generated test cases saved to {output_file}")

Conclusion

With this workflow, you can:

  1. Load and process CSV data for software requirements or testing documents.
  2. Use Generative AI with prompt engineering to extract, analyze, or generate additional information (e.g., test cases).
  3. Save AI-generated results into a structured CSV file for further use.

This approach is highly modular and can be adapted for different tasks, such as summarizing requirements, identifying gaps, or validating completeness. Happy engineering! 🚀

Detailed Prompt Engineering for Processing Software Requirements or Testing Documents in CSV Format

When working with software requirements or testing documents in a CSV format, crafting detailed queries or instructions using prompt engineering becomes crucial. These documents often have structured data (e.g., requirements, test cases, priorities) that require targeted processing, such as extracting specific information, analyzing gaps, or generating additional data like test scenarios.

In this blog, we’ll explore how to write detailed queries and instructions to process software requirements or testing documents stored in CSV format using prompt engineering.

Understanding the CSV Structure

Before writing prompts, it’s essential to understand the structure of the CSV file. Here’s an example of how software requirements or testing data might look in a CSV:

Example CSV Data: requirements.csv

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

In this structure:

  • ID: Unique identifier for each requirement.
  • Requirement: The actual description of the requirement.
  • Type: Whether it’s a functional or non-functional requirement.
  • Priority: Priority level (High, Medium, Low).
  • Status: Current status of the requirement (e.g., Approved, Pending).

Step-by-Step Guide for Writing Queries

Step 1: Define the Task

Clearly specify the goal of your query. Examples include:

  • Extracting requirements based on specific criteria (e.g., type, priority, status).
  • Generating test cases for functional requirements.
  • Analyzing requirements for missing details or ambiguities.
  • Summarizing the document for a quick overview.

Step 2: Include the Relevant CSV Data

Provide a subset of the CSV data in your prompt to help the AI understand the structure and context. If the CSV is large, include only a few rows as examples.

Example CSV Excerpt for Prompt:

Here is a sample of the software requirements document in CSV format:

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

Step 3: Write a Detailed Query or Instruction

1. Extracting Specific Requirements

If you want to filter the requirements based on certain conditions (e.g., "High Priority" functional requirements), write the query like this:

Prompt:

Here is a sample of the software requirements document in CSV format:

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

Query: Extract all "High Priority" functional requirements. Return them as a numbered list.

Expected Output:

1. Users must be able to register and log in using their email and password.
2. Payment processing must support credit cards and PayPal securely.

2. Generating Test Scenarios

To generate test cases for functional requirements, include an instruction for the AI to create test scenarios:

Prompt:

Here is a sample of the software requirements document in CSV format:

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved

Query: Generate 2 test scenarios for each functional requirement. Return them as a list.

Expected Output:

1. Test Case for Requirement 1:
   a. Verify that the user can register with a valid email and password.
   b. Verify that the user cannot register with an invalid email format.
2. Test Case for Requirement 2:
   a. Verify that searching for a keyword returns relevant results.
   b. Verify that searching by category displays all matching results.
3. Test Case for Requirement 4:
   a. Verify that payment can be processed securely via credit card.
   b. Verify that payment can be processed securely via PayPal.

3. Summarizing Requirements

Sometimes, you may need a quick summary of all requirements.

Prompt:

Here is a sample of the software requirements document in CSV format:

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

Query: Summarize the requirements by their type (Functional vs Non-Functional).

Expected Output:

Functional Requirements:
1. Users must be able to register and log in using their email and password.
2. Search functionality must return relevant results within 2 seconds.
3. Payment processing must support credit cards and PayPal securely.

Non-Functional Requirements:
1. The platform must handle 500 concurrent users.
2. Daily backups of all data must be performed automatically.

4. Analyzing for Missing Details

To identify missing or incomplete requirements:

Prompt:

Here is a sample of the software requirements document in CSV format:

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved

Query: Analyze the requirements and identify any missing or incomplete details. Suggest improvements to make the requirements more comprehensive.

Expected Output:

1. Requirement 1: Add details about password complexity requirements (e.g., minimum length, special characters).
2. Requirement 2: Specify how relevance will be determined for search results (e.g., ranking algorithm).
3. Requirement 3: Include performance benchmarks for the 500 concurrent users (e.g., response time under load).

Step 4: Define the Output Format

Clearly specify the format for the output, such as lists, JSON, or plain text.

Example: JSON Format

Query: Extract all "High Priority" requirements and return them in JSON format.

Expected Output:
{
  "High Priority Requirements": [
    {"ID": 1, "Requirement": "Users must be able to register and log in using their email and password", "Type": "Functional", "Status": "Approved"},
    {"ID": 3, "Requirement": "The platform must handle 500 concurrent users", "Type": "Non-Functional", "Status": "Approved"},
    {"ID": 4, "Requirement": "Payment processing must support credit cards and PayPal securely", "Type": "Functional", "Status": "Approved"}
  ]
}

Tips for Writing Effective Prompts

  1. Be Specific:
    • Use clear and concise instructions, such as "Extract all high-priority functional requirements."
  2. Provide Context:
    • Include a sample of the CSV data to help the AI understand the structure.
  3. Break Down Complex Tasks:
    • For tasks like generating test cases, split the query into multiple steps if needed.
  4. Define Output Format:
    • Specify whether you want the output as plain text, a list, or JSON.
  5. Iterate and Refine:
    • Test your prompts with different queries and refine them to improve accuracy.

Conclusion

Using prompts to process software requirements or testing documents in CSV format requires a clear understanding of the data structure and the task at hand. By following the steps outlined above, you can:

  • Extract relevant information.
  • Generate test cases.
  • Analyze requirements for gaps.
  • Summarize large documents.

With practice, you’ll be able to write detailed and effective queries that leverage the full power of Generative AI to streamline your software development and testing workflows. 🚀

AI's Impact on the IT Industry 2026