Talk To Your CSV: How To Visualize Your Data With Langchain And Streamlit



 

 

https://www.canva.com/

Large model models (LLMS) have progressively grown powerful and competently. These models can be used for a wide array of applications, such as text generation, language translation, and providing answers to queries.

What is Langchain?

In my previous article, I made an introduction to Langchain and its inner workings. If you check my columns, you will find it offers an informative and detailed explanation of how Langchain works

In this tutorial, I will show you how to use Langchain and Streamlit to analyze CSV files, We will leverage the OpenAI API for GPT-3 access, and employ Streamlit for user interface development. This will enable users to upload CSV files and pose queries about the data. The system will generate the answers and illustrate the information with tables and charts.

Let’s start coding

1. Set up your environment

1. You want to start by creating a venv on your local machine.

First, open your terminal and create a virtual environment.

python -m venv venv

then activate it:

venv\Scripts\activate

You should see (Venv) in the terminal now.

Now, let’s install the required dependencies:

Pip install langchain==0.0.146 , python-dotenv==1.0.0 , 
streamlit== 1.22.0 , open ai == 0.27.7 , tabulate==0.9.0

Finally, we’ll need to set an environment variable for the OpenAI API key:

set OPENAI_API_KEY=<YOUR_API_KEY>

Now, that we’re all set, let’s start!

Create a file named “Talk_with_CSV.py”, where we will write the functions for answering questions.

let’s import the required dependencies:

from langchain import OpenAI
from langchain.agents import create_pandas_dataframe_agent
import pandas as pd
from dotenv import load_dotenv
import json
import streamlit as st

Read the file

load_dotenv()
openai_key = os.getenv("OPENAI_API_KEY")

Setting up the agent

Creating the agent is a simple process since we are going to use the create_pandas_dataframe_agent provided by Langchain! For those who might not be familiar with it, an agent is a component that can help to tap into a collection of tools, making decisions on which to use based on the users’ input. There are mainly 2 categories of agents: “Action Agents” and “Plan-and-Execute Agents”.

Action Agents determine a course of action and carry it out by order, one by one phase. Moreover, they can take out and manage data from various sources, such as databases, APIs, and in this particular scenario, a CSV file.

Create Tool_CSV Function

The Tool_CSV function allows a path of a CSV file as its input and a return agent that can access and use a large language model (LLM). this function generates an OpenAI object, reads the CSV file and then converts it into a Pandas DataFrame. Finally, it formulates a Pandas DataFrame agent which is then returned.

def csv_tool(filename : str):

df = pd.read_csv(filename)
return create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)

Create Ask_agent Function

The query_agent function is the powerhouse of this process. It accepts a pandas_dataframe_agent and a query as input, returning the agent’s response in string format. the function generates a prompt for the agent, specifying the type of responses we’re seeking. The goal is to have the agent provide a string that can subsequently be converted into a dictionary. Depending on what’s contained within this dictionary, the program will manifest a graph, table, or simple text response.

def ask_agent(agent, query):
"""
Query an agent and return the response as a string.

Args:
agent: The agent to query.
query: The query to ask the agent.

Returns:
The response from the agent as a string.
"""
# Prepare the prompt with query guidelines and formatting
prompt = (
"""
Let's decode the way to respond to the queries. The responses depend on the type of information requested in the query.

1. If the query requires a table, format your answer like this:
{"table": {"columns": ["column1", "column2", ...], "data": [[value1, value2, ...], [value1, value2, ...], ...]}}

2. For a bar chart, respond like this:
{"bar": {"columns": ["A", "B", "C", ...], "data": [25, 24, 10, ...]}}

3. If a line chart is more appropriate, your reply should look like this:
{"line": {"columns": ["A", "B", "C", ...], "data": [25, 24, 10, ...]}}

Note: We only accommodate two types of charts: "bar" and "line".

4. For a plain question that doesn't need a chart or table, your response should be:
{"answer": "Your answer goes here"}

For example:
{"answer": "The Product with the highest Orders is '15143Exfo'"}

5. If the answer is not known or available, respond with:
{"answer": "I do not know."}

Return all output as a string. Remember to encase all strings in the "columns" list and data list in double quotes.
For example: {"columns": ["Products", "Orders"], "data": [["51993Masc", 191], ["49631Foun", 152]]}

Now, let's tackle the query step by step. Here's the query for you to work on:
"""
+ query
)

# Run the prompt through the agent and capture the response.
response = agent.run(prompt)

# Return the response converted to a string.
return str(response)

The function called ask_agent is the most important part. It takes in two things — an agent (pandas dataframe) and a question (query).

This function starts by creating a ‘prompt’, which is like a question or instruction for the agent. This prompt tells the agent what kind of answers we want.

The goal is for the agent to give back a string, which is a type of data that is text-based. Later, this string will be changed into a dictionary, another type of data that includes key-value pairs, kind of like a real-life dictionary that pairs a word (key) with its meaning (value).

Depending on what’s in that dictionary, the program will then show a graph, a table or just a text answer.

Setting Up The Streamlit Interface

Streamlit is a free and open-source framework to quickly create and share beautiful machine learning and data science web apps. It is a Python-based library designed to be quick and easy to use and build beautiful interactive apps without any JavaScript or CSS knowledge.

def decode_response(response: str) -> dict:
"""This function converts the string response from the model to a dictionary object.

Args:
response (str): response from the model

Returns:
dict: dictionary with response data
"""
return json.loads(response)

The decode_response function will change the agent’s response, a string, into a dictionary for easier use.

def write_answer(response_dict: dict):
"""
Write a response from an agent to a Streamlit app.

Args:
response_dict: The response from the agent.

Returns:
None.
"""

# Check if the response is an answer.
if "answer" in response_dict:
st.write(response_dict["answer"])

# Check if the response is a bar chart.
# Check if the response is a bar chart.
if "bar" in response_dict:
data = response_dict["bar"]
try:
df_data = {
col: [x[i] if isinstance(x, list) else x for x in data['data']]
for i, col in enumerate(data['columns'])
}
df = pd.DataFrame(df_data)
df.set_index("Products", inplace=True)
st.bar_chart(df)
except ValueError:
print(f"Couldn't create DataFrame from data: {data}")

# Check if the response is a line chart.
if "line" in response_dict:
data = response_dict["line"]
try:
df_data = {col: [x[i] for x in data['data']] for i, col in enumerate(data['columns'])}
df = pd.DataFrame(df_data)
df.set_index("Products", inplace=True)
st.line_chart(df)
except ValueError:
print(f"Couldn't create DataFrame from data: {data}")


# Check if the response is a table.
if "table" in response_dict:
data = response_dict["table"]
df = pd.DataFrame(data["data"], columns=data["columns"])
st.table(df)

This function accepts a response dictionary and uses it to display output on the Streamlit app. This could include answers, bar graphs, line graphs, and tables.

The function first sees if the response is an ‘answer’, meaning a regular text response for questions like ‘How many rows are in the document?’. If that’s the case, it will show the answer on the app.

Next, the function verifies if the response is meant to generate a bar chart. If so, it uses the response data to create the bar chart and displays it on the app.

The function also checks if the response is intended to create a line chart. If that’s the case, it forms a line chart using the response data and displays it on the app.

The function further checks if the response is meant for a table. If so, it builds a table from the response data and displays it on the app.

Last but not least, we’ll build the initial interface. Add the following lines:

st.set_page_config(page_title="‍ Talk with your CSV")
st.title("‍ Talk with your CSV")

st.write("Please upload your CSV file below.")

data = st.file_uploader("Upload a CSV" , type="csv")

query = st.text_area("Send a Message")

if st.button("Submit Query", type="primary"):
# Create an agent from the CSV file.
agent = csv_tool(data)

# Query the agent.
response = ask_agent(agent=agent, query=query)

# Decode the response.
decoded_response = decode_response(response)

# Write the response to the Streamlit app.
write_answer(decoded_response)

This code makes a Streamlit app where users can talk with their CSV files. First, the app asks the user to upload a CSV file. Then, it requests the user to type in a query. If the user hits the “Submit Query” button, the app will ask the agent a question and display the answer on the app.

The app uses the following functions:

Tool_csv(): It generates an agent from a CSV file.

ask_agent(): This function asks a question to an agent and gives back the answer.

// LAST UPDATED ON 29/09
Amazon Most Wished For

decode_response(): This function translates an agent’s response.

write_response(): This function generates a response on a Streamlit app.

Let’s try it out!

Now, run the application by typing ‘streamlit run Talk_with_csv.py’ in the terminal.

This command will open a new window in your browser that should appear like this:

For this tutorial, I am going to use data that can be found on my GitHub

Go ahead and upload your CSV file, and start entering your queries!

First query: Which Products have the highest orders?

Second query: Tabulate the first 5 Products. Include the Products and Order columns only.?

Third query: Create a line graph on the first 5 products. use the Products as the column and Orders as a data values

Forth query: Create a bar graph on the first 5 products. use the Products as the column and Orders as a data values

In Conclusion:

Langchain and Streamlit are useful tools that simplify the process for users to inquire about their data using Language Models. The application lets users see their data in visual forms

It’s a great tool for members who want a deeper understanding of their data or to make sense of it.

Don’t hesitate to contact us if you have any queries!

Reference :

https://python.langchain.com/en/latest/modules/agents/agents/examples/chat_conversation_agent.html?highlight=agent_chain

https://docs.streamlit.io/

https://www.datacamp.com/tutorial/streamlit

Stay tuned for more details on trending AI-related implementations and discussions on my personal blog.

‍♂️ We are AI application experts! If you want to collaborate on a project, drop an inquiry here, stop by our website, or shoot us a direct email.

Feel free to check out my other articles:

 

 

 

Original Post>

Understanding LangChain ️: PART 1

//Last UPDATE ON 02/10
Today's deals