reference this blog,
Code
I am not going into the details of how all this is working, as this would make this blog post longer and go against the title. You can refer to gpt-index.readthedocs.io/en/latest if you need to learn more.
- Create a folder and open up it in your favorite code editor. Create a virtual environment for this project if needed.
- For this tutorial, we need to have gpt-index installed.
pip install gpt-index
If your data sources are in form of PDF’s also install PyPDF2
pip install PyPDF2
Now create a new file main.py and add the following code:
import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
from gpt_index import GPTSimpleVectorIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents)
# save to disk
index.save_to_disk('index.json')
For this code to run, you need to have your datasources be it PDF’s, text files etc inside of a directory named as data in the same folder. Run the code after adding data.
Your project directory should look something like this:
project/
├─ data/
│ ├─ data1.pdf
├─ query.py
├─ main.py
- Now create another file named
query.pyand add the following code:
import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
from gpt_index import GPTSimpleVectorIndex
# load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')
print(index.query("Any Query You have in your datasets"))
If you run this code you will be getting response from OpenAI with the query you have sent.
I have tried using this paper on Arxiv, as a datasource and asked for this query:
Another example is given by this vlog:
Introduction
This notebook has all the code you need to create your own chatbot with custom knowledge base using GPT-3.
Follow the instructions for each steps and then run the code sample. In order to run the code, you need to press “play” button near each code sample.
Download the data for your custom knowledge base
For the demonstration purposes we are going to use —– as our knowledge base. You can download them to your local folder from the github repository by running the code below. Alternatively, you can put your own custom data into the local folder.
[ ]
! git clone https://github.com/irina1nik/context_data.git
Cloning into 'context_data'... remote: Enumerating objects: 30, done. remote: Total 30 (delta 0), reused 0 (delta 0), pack-reused 30 Unpacking objects: 100% (30/30), 12.56 KiB | 218.00 KiB/s, done.
Install the dependicies
Run the code below to install the depencies we need for our functions
[ ]
!pip install llama-index
!pip install langchain
Define the functions
The following code defines the functions we need to construct the index and query it
[ ]
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display
def construct_index(directory_path):
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_outputs = 2000
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 600
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
documents = SimpleDirectoryReader(directory_path).load_data()
index = GPTSimpleVectorIndex(
documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper
)
index.save_to_disk('index.json')
return index
def ask_ai():
index = GPTSimpleVectorIndex.load_from_disk('index.json')
while True:
query = input("What do you want to ask? ")
response = index.query(query, response_mode="compact")
display(Markdown(f"Response: <b>{response.response}</b>"))
Set OpenAI API Key
You need an OPENAI API key to be able to run this code.
If you don’t have one yet, get it by signing up. Then click your account icon on the top right of the screen and select “View API Keys”. Create an API key.
Then run the code below and paste your API key into the text input.
[ ]
os.environ[“OPENAI_API_KEY”] = input(“Paste your OpenAI key here and hit enter:”)
Construct an index
Now we are ready to construct the index. This will take every file in the folder ‘data’, split it into chunks, and embed it with OpenAI’s embeddings API.
Notice: running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you’ve just set up your account, the free credits that you have should be more than enough for this experiment.
[ ]
construct_index(“context_data/data”)
Ask questions
It’s time to have fun and test our AI. Run the function that queries GPT and type your question into the input.
If you’ve used the provided example data for your custom knowledge base, here are a few questions that you can ask:
- Why people cook at home? Make classification
- Make classification about what frustrates people about cooking?
- Brainstorm marketing campaign ideas for an air fryer that would appeal people that cook at home
- Which kitchen appliences people use most often?
- What people like about cooking at home?
Explore more on Langchain library. in another blog