Skip to main content

Processing Software Requirements or Testing Documents in CSV Format with Python and Generative AI

 Software requirements or testing documents often contain structured data, and querying or processing these documents effectively can make tasks like test case generation, requirement analysis, and data summarization much easier. In this blog, we’ll explore how to use Python and Generative AI to process software requirements or testing documents stored in CSV files.

We’ll cover:

  1. Reading and preparing the CSV file.
  2. Writing queries for Generative AI using prompt engineering.
  3. Using Generative AI to extract, process, and generate additional data based on the CSV.
  4. Saving results back to a CSV file.

1. Understanding the Example CSV

Here’s an example of a software requirements CSV file (requirements.csv):

ID,Requirement,Type,Priority,Status
1,Users must be able to register and log in using their email and password,Functional,High,Approved
2,Search functionality must return relevant results within 2 seconds,Functional,Medium,Pending
3,The platform must handle 500 concurrent users,Non-Functional,High,Approved
4,Payment processing must support credit cards and PayPal securely,Functional,High,Approved
5,Daily backups of all data must be performed automatically,Non-Functional,Medium,Pending

2. Python Code to Read and Process the CSV

We’ll use Python to:

  1. Read the CSV file into a DataFrame.
  2. Convert it to JSON for Generative AI queries.
  3. Write prompts to query the data.
  4. Save the AI-generated results back to a CSV.

Setup

Install Required Libraries:

pip install pandas openai

Step 1: Read the CSV File

We’ll use pandas to read the CSV file into a DataFrame and convert it into JSON format for use with Generative AI.

import pandas as pd

# Read the CSV file
csv_file = 'requirements.csv'
df = pd.read_csv(csv_file)

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Convert the DataFrame into JSON for AI queries
json_data = df.to_json(orient='records')
print("\nData in JSON format:")
print(json_data)

Output:

Original DataFrame:
   ID                                          Requirement           Type  \
0   1  Users must be able to register and log in using...    Functional   
1   2  Search functionality must return relevant resul...    Functional   
2   3     The platform must handle 500 concurrent users  Non-Functional   
3   4  Payment processing must support credit cards an...    Functional   
4   5  Daily backups of all data must be performed aut...  Non-Functional   

  Priority    Status  
0     High  Approved  
1   Medium   Pending  
2     High  Approved  
3     High  Approved  
4   Medium   Pending  

Data in JSON format:
[{"ID":1,"Requirement":"Users must be able to register and log in using their email and password","Type":"Functional","Priority":"High","Status":"Approved"}, ...]

Step 2: Write a Query Using Prompt Engineering

To interact with the JSON data, we’ll craft a prompt for Generative AI. For this example, we’ll filter all high-priority functional requirements and generate test cases for each.

Prepare the Prompt

import openai

# Set your OpenAI API key
openai.api_key = 'your-openai-api-key'

# Define the prompt
prompt = f"""
Here is the software requirements data in JSON format:

{json_data}

Task:
1. Extract all "High Priority" functional requirements.
2. Generate 2 test cases for each extracted requirement.
3. Return the output in a structured JSON format.
"""

# Display the prompt
print("Generated Prompt:")
print(prompt)

Step 3: Query the Generative AI Model

We’ll send the prompt to OpenAI’s API (e.g., GPT-4) and process the output.

# Query the OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",  # Use the appropriate engine
    prompt=prompt,
    max_tokens=500,
    temperature=0
)

# Extract the response text
ai_output = response.choices[0].text.strip()

# Display AI's Output
print("\nAI Output:")
print(ai_output)

Expected AI Output:

{
  "High Priority Requirements": [
    {
      "Requirement": "Users must be able to register and log in using their email and password",
      "Test Cases": [
        "Verify that the user can register with a valid email and password.",
        "Verify that the user cannot register with an invalid email format."
      ]
    },
    {
      "Requirement": "Payment processing must support credit cards and PayPal securely",
      "Test Cases": [
        "Verify that payment can be processed securely via credit card.",
        "Verify that payment can be processed securely via PayPal."
      ]
    }
  ]
}

Step 4: Save the Results Back to a CSV

Now, we’ll parse the AI’s JSON output and save the results in a structured CSV file.

Parse and Save the Results

import json

# Parse the AI output (assumes it is valid JSON)
output_data = json.loads(ai_output)

# Flatten the data for CSV storage
flattened_data = []
for item in output_data['High Priority Requirements']:
    for test_case in item['Test Cases']:
        flattened_data.append({
            "Requirement": item['Requirement'],
            "Test Case": test_case
        })

# Convert to a DataFrame
output_df = pd.DataFrame(flattened_data)

# Save to a new CSV file
output_file = 'high_priority_test_cases.csv'
output_df.to_csv(output_file, index=False)

print(f"\nGenerated test cases saved to {output_file}")

Generated CSV (high_priority_test_cases.csv):

Requirement,Test Case
Users must be able to register and log in using their email and password,Verify that the user can register with a valid email and password.
Users must be able to register and log in using their email and password,Verify that the user cannot register with an invalid email format.
Payment processing must support credit cards and PayPal securely,Verify that payment can be processed securely via credit card.
Payment processing must support credit cards and PayPal securely,Verify that payment can be processed securely via PayPal.

Complete Python Script

Here’s the complete code for the workflow:

import pandas as pd
import openai
import json

# Set OpenAI API key
openai.api_key = 'your-openai-api-key'

# Step 1: Read the CSV file
csv_file = 'requirements.csv'
df = pd.read_csv(csv_file)
json_data = df.to_json(orient='records')

# Step 2: Define the prompt
prompt = f"""
Here is the software requirements data in JSON format:

{json_data}

Task:
1. Extract all "High Priority" functional requirements.
2. Generate 2 test cases for each extracted requirement.
3. Return the output in a structured JSON format.
"""

# Step 3: Query the OpenAI API
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=500,
    temperature=0
)

# Extract and parse the AI output
ai_output = response.choices[0].text.strip()
output_data = json.loads(ai_output)

# Step 4: Flatten the data for saving to CSV
flattened_data = []
for item in output_data['High Priority Requirements']:
    for test_case in item['Test Cases']:
        flattened_data.append({
            "Requirement": item['Requirement'],
            "Test Case": test_case
        })

# Convert to DataFrame and save to CSV
output_df = pd.DataFrame(flattened_data)
output_file = 'high_priority_test_cases.csv'
output_df.to_csv(output_file, index=False)

print(f"Generated test cases saved to {output_file}")

Conclusion

With this workflow, you can:

  1. Load and process CSV data for software requirements or testing documents.
  2. Use Generative AI with prompt engineering to extract, analyze, or generate additional information (e.g., test cases).
  3. Save AI-generated results into a structured CSV file for further use.

This approach is highly modular and can be adapted for different tasks, such as summarizing requirements, identifying gaps, or validating completeness. Happy engineering! 🚀

Comments

Popular posts from this blog

Transforming Workflows with CrewAI: Harnessing the Power of Multi-Agent Collaboration for Smarter Automation

 CrewAI is a framework designed to implement the multi-agent concept effectively. It helps create, manage, and coordinate multiple AI agents to work together on complex tasks. CrewAI simplifies the process of defining roles, assigning tasks, and ensuring collaboration among agents.  How CrewAI Fits into the Multi-Agent Concept 1. Agent Creation:    - In CrewAI, each AI agent is like a specialist with a specific role, goal, and expertise.    - Example: One agent focuses on market research, another designs strategies, and a third plans marketing campaigns. 2. Task Assignment:    - You define tasks for each agent. Tasks can be simple (e.g., answering questions) or complex (e.g., analyzing large datasets).    - CrewAI ensures each agent knows what to do based on its defined role. 3. Collaboration:    - Agents in CrewAI can communicate and share results to solve a big problem. For example, one agent's output becomes the input for an...

Optimizing LLM Queries for CSV Files to Minimize Token Usage: A Beginner's Guide

When working with large CSV files and querying them using a Language Model (LLM), optimizing your approach to minimize token usage is crucial. This helps reduce costs, improve performance, and make your system more efficient. Here’s a beginner-friendly guide to help you understand how to achieve this. What Are Tokens, and Why Do They Matter? Tokens are the building blocks of text that LLMs process. A single word like "cat" or punctuation like "." counts as a token. Longer texts mean more tokens, which can lead to higher costs and slower query responses. By optimizing how you query CSV data, you can significantly reduce token usage. Key Strategies to Optimize LLM Queries for CSV Files 1. Preprocess and Filter Data Before sending data to the LLM, filter and preprocess it to retrieve only the relevant rows and columns. This minimizes the size of the input text. How to Do It: Use Python or database tools to preprocess the CSV file. Filter for only the rows an...

Artificial Intelligence (AI) beyond the realms of Machine Learning (ML) and Deep Learning (DL).

AI (Artificial Intelligence) : Definition : AI encompasses technologies that enable machines to mimic cognitive functions associated with human intelligence. Examples : 🗣️  Natural Language Processing (NLP) : AI systems that understand and generate human language. Think of chatbots, virtual assistants (like Siri or Alexa), and language translation tools. 👀  Computer Vision : AI models that interpret visual information from images or videos. Applications include facial recognition, object detection, and self-driving cars. 🎮  Game Playing AI : Systems that play games like chess, Go, or video games using strategic decision-making. 🤖  Robotics : AI-powered robots that can perform tasks autonomously, such as assembly line work or exploring hazardous environments. Rule-Based Systems : Definition : These are AI systems that operate based on predefined rules or logic. Examples : 🚦  Traffic Light Control : Rule-based algorithms manage traffic lights by following fix...