Skip to main content

From CSV to JSON: Querying Data Using Prompt Engineering and Storing Results in a CSV


In this guide, we'll walk through the full process of reading data from a CSV file, converting it into JSON, using prompt engineering to query that JSON data, retrieving a single-row answer, and finally storing the result back into another CSV file. This workflow combines data processing and the power of Generative AI to interact with tabular data in a human-friendly way.

Overview of Steps

  1. Read the CSV file and convert it into JSON format.
  2. Use prompt engineering to query the JSON data.
  3. Retrieve a single-row answer based on the user’s query.
  4. Append the result as a new row to another CSV file.

Step 1: Read the CSV File and Convert It into JSON

Before querying the data, we’ll start by reading the data from the CSV file and converting it into a JSON format that the AI model can interpret.

Sample CSV File (data.csv)

Here’s an example of the data we’ll use:

Name,Age,Department,Salary
Alice,30,HR,50000
Bob,45,IT,70000
Charlie,28,Finance,60000
Diana,35,Marketing,65000

Python Code to Read and Convert CSV to JSON

We’ll use pandas to load the CSV file and convert it into JSON.

import pandas as pd

# Step 1: Read the CSV file
csv_file = 'data.csv'
df = pd.read_csv(csv_file)

# Convert the DataFrame to JSON format (records orientation)
json_data = df.to_json(orient='records')

# Display the JSON
print(json_data)

JSON Output:

json

[
  {"Name":"Alice","Age":30,"Department":"HR","Salary":50000},
  {"Name":"Bob","Age":45,"Department":"IT","Salary":70000},
  {"Name":"Charlie","Age":28,"Department":"Finance","Salary":60000},
  {"Name":"Diana","Age":35,"Department":"Marketing","Salary":65000}
]

Step 2: Craft the Prompt to Query JSON Data

The next step is to ask questions about the JSON data using prompt engineering. A well-crafted prompt ensures that the AI can interpret the data and return the desired row correctly.

Crafting the Prompt

We’ll include the JSON data in the prompt and ask a specific question to retrieve a single row.

Example Prompt

Here is the data in JSON format:

[
  {"Name":"Alice","Age":30,"Department":"HR","Salary":50000},
  {"Name":"Bob","Age":45,"Department":"IT","Salary":70000},
  {"Name":"Charlie","Age":28,"Department":"Finance","Salary":60000},
  {"Name":"Diana","Age":35,"Department":"Marketing","Salary":65000}
]

Query: Find the row where the 'Department' is 'IT'. Return the output as a single row in JSON format.

Expected AI Output

{"Name":"Bob","Age":45,"Department":"IT","Salary":70000}

Step 3: Use Generative AI to Query the Data

To execute the query, we’ll use an AI model via an API (e.g., OpenAI’s GPT API). The model will process the JSON data and return the requested row based on the query.

Python Code to Query the Data

Here’s how you can integrate a generative AI model to query the JSON data:

import openai
import json

# Step 1: Set up OpenAI API key
openai.api_key = 'your-openai-api-key'

# Step 2: Define the JSON data and the query
json_data = [
    {"Name": "Alice", "Age": 30, "Department": "HR", "Salary": 50000},
    {"Name": "Bob", "Age": 45, "Department": "IT", "Salary": 70000},
    {"Name": "Charlie", "Age": 28, "Department": "Finance", "Salary": 60000},
    {"Name": "Diana", "Age": 35, "Department": "Marketing", "Salary": 65000}
]
query = "Find the row where the 'Department' is 'IT'. Return the output as a single row in JSON format."

# Step 3: Craft the prompt
prompt = f"""
Here is the data in JSON format:

{json.dumps(json_data, indent=2)}

Query: {query}
"""

# Step 4: Use OpenAI's GPT API to process the prompt
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=100,
    temperature=0
)

# Step 5: Extract the AI's response
result = response.choices[0].text.strip()
print("AI Response (Row):", result)

# Convert the response back to a Python dictionary (JSON object)
row_data = json.loads(result)

Result:

{"Name": "Bob", "Age": 45, "Department": "IT", "Salary": 70000}

Step 4: Store the Result in a New CSV File

Finally, we’ll save the retrieved row into another CSV file (output.csv). If the file already exists, we’ll append the result as a new row.

Python Code to Save the Result

# Step 1: Convert the row data into a DataFrame
result_df = pd.DataFrame([row_data])

# Step 2: Append the row to 'output.csv'
output_file = 'output.csv'

# If the file exists, append without writing the header
try:
    result_df.to_csv(output_file, mode='a', index=False, header=False)
except FileNotFoundError:
    # If the file doesn't exist, write with the header
    result_df.to_csv(output_file, mode='w', index=False, header=True)

print(f"Row saved to {output_file}")

Output in output.csv

After running the script, the output.csv file will contain the following content:

Name,Age,Department,Salary
Bob,45,IT,70000

If you run additional queries, the new rows will be appended to the file.


Step 5: Automate the Workflow

To handle multiple queries or automate the workflow, you can wrap the steps into a function and call it with different queries.

Complete Workflow Script

def query_and_save_row(csv_file, query, output_file):
    # Read the CSV file
    df = pd.read_csv(csv_file)
    json_data = df.to_json(orient='records')

    # Craft the prompt
    prompt = f"""
    Here is the data in JSON format:

    {json_data}

    Query: {query}
    """
    
    # Query the AI model
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=100,
        temperature=0
    )
    
    # Extract the AI's response
    row_data = json.loads(response.choices[0].text.strip())
    
    # Save the result to output CSV
    result_df = pd.DataFrame([row_data])
    try:
        result_df.to_csv(output_file, mode='a', index=False, header=not pd.io.common.file_exists(output_file))
    except FileNotFoundError:
        result_df.to_csv(output_file, mode='w', index=False, header=True)
    
    print(f"Row saved to {output_file}")


# Example usage
query_and_save_row('data.csv', "Find the row where the 'Department' is 'IT'. Return the output as a single row in JSON format.", 'output.csv')

Conclusion

This workflow demonstrates how to:

  1. Read data from a CSV file and convert it into JSON format.
  2. Ask questions using prompt engineering to query the JSON data.
  3. Retrieve a single-row answer from the AI model.
  4. Store the result back into another CSV file.

This method is versatile and can be adapted for various use cases, including data analysis, reporting, and automation. By combining the power of Generative AI and Python, you can create intelligent, human-like workflows for your data-driven tasks! 🚀

Comments

Popular posts from this blog

Transforming Workflows with CrewAI: Harnessing the Power of Multi-Agent Collaboration for Smarter Automation

 CrewAI is a framework designed to implement the multi-agent concept effectively. It helps create, manage, and coordinate multiple AI agents to work together on complex tasks. CrewAI simplifies the process of defining roles, assigning tasks, and ensuring collaboration among agents.  How CrewAI Fits into the Multi-Agent Concept 1. Agent Creation:    - In CrewAI, each AI agent is like a specialist with a specific role, goal, and expertise.    - Example: One agent focuses on market research, another designs strategies, and a third plans marketing campaigns. 2. Task Assignment:    - You define tasks for each agent. Tasks can be simple (e.g., answering questions) or complex (e.g., analyzing large datasets).    - CrewAI ensures each agent knows what to do based on its defined role. 3. Collaboration:    - Agents in CrewAI can communicate and share results to solve a big problem. For example, one agent's output becomes the input for an...

Optimizing LLM Queries for CSV Files to Minimize Token Usage: A Beginner's Guide

When working with large CSV files and querying them using a Language Model (LLM), optimizing your approach to minimize token usage is crucial. This helps reduce costs, improve performance, and make your system more efficient. Here’s a beginner-friendly guide to help you understand how to achieve this. What Are Tokens, and Why Do They Matter? Tokens are the building blocks of text that LLMs process. A single word like "cat" or punctuation like "." counts as a token. Longer texts mean more tokens, which can lead to higher costs and slower query responses. By optimizing how you query CSV data, you can significantly reduce token usage. Key Strategies to Optimize LLM Queries for CSV Files 1. Preprocess and Filter Data Before sending data to the LLM, filter and preprocess it to retrieve only the relevant rows and columns. This minimizes the size of the input text. How to Do It: Use Python or database tools to preprocess the CSV file. Filter for only the rows an...

Artificial Intelligence (AI) beyond the realms of Machine Learning (ML) and Deep Learning (DL).

AI (Artificial Intelligence) : Definition : AI encompasses technologies that enable machines to mimic cognitive functions associated with human intelligence. Examples : 🗣️  Natural Language Processing (NLP) : AI systems that understand and generate human language. Think of chatbots, virtual assistants (like Siri or Alexa), and language translation tools. 👀  Computer Vision : AI models that interpret visual information from images or videos. Applications include facial recognition, object detection, and self-driving cars. 🎮  Game Playing AI : Systems that play games like chess, Go, or video games using strategic decision-making. 🤖  Robotics : AI-powered robots that can perform tasks autonomously, such as assembly line work or exploring hazardous environments. Rule-Based Systems : Definition : These are AI systems that operate based on predefined rules or logic. Examples : 🚦  Traffic Light Control : Rule-based algorithms manage traffic lights by following fix...