Find answers from the community

Updated 2 years ago

Im also getting far worse results from

At a glance
Im also getting far worse results from querying now, I did some indexing and tests in version 0.3.2 with the state davinci 3 was in then, and the results are far wors when I try to make it better, today @holodeck dunno why
1
j
S
d
16 comments
cc @holodeck @Sandkoan seems like some people are running into issues. are you using the simple vector index without customizing the llm? (still using text-davinci-003?)
or are you using chatgpt
I'm still using text-davinci-003 as far as I know.
hmm @Sandkoan, let me know if you're able to send me some code to help me diagnose. i'll try taking a look at this too. also, which version did you upgrade from?
Not too sure, but I think from something pre-0.4.11, maybe?
It's hard to say whether the issue is a OpenAI thing or a GPT-Index thing.
are you saving/loading from disk by any chance?
Nope, just using qdrant now.
I lean towards the former.
I have tried them both. Back in 0.3.2 I kind of got the answers I was looking for, but now they seem to not be accurate and I get alot of different answers by changing the tokens a little.

I have tried both davinci and gpt 3.5 turbo

SimpleVector and loading/saving from disk - but I do want to learn other ways to do it, if there are better choices.

I have made a docker compose with fastapi and sveltekit working together
I am saving and loading to a string which in turn is saved or loaded from a db.. however I'm doing more analysis and it looks like the webpage scrape isn't picking up all the keywords I'm looking for, maybe its this aspect for others too - need more investigation. I'm using SimpleVectorIndex without customizing the llm, maybe I'll play with the playground later since I'm not exactly sure what might be optimal settings. I have the langchain trace going too, so I can monitor the backend queries . staying away from chatgpt until I get a grasp on whats going on.
What are you using to create the embedding?
I changed to the latest

langchain==0.0.102rc0
llama_index==0.4.21

And now it gives really nice results.

Its probably reaaaaaly basic what I am doing;

Plain Text
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_outputs = 512
# set maximum chunk overlap
max_chunk_overlap = 40
# set chunk size limit
chunk_size_limit = 600

# define LLM
llm_predictor = LLMPredictor(llm=OpenAIChat(
    temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
prompt_helper = PromptHelper(max_input_size,
                             num_outputs,
                             max_chunk_overlap,
                             chunk_size_limit=chunk_size_limit)

documents = SimpleDirectoryReader('amldata').load_data()

index = GPTSimpleVectorIndex(documents,
                             llm_predictor=llm_predictor,
                             prompt_helper=prompt_helper)

# save to disk
index.save_to_disk('./indexes/aml.json')
thanks for that, I'll upgrade to .21, currently on .20
Have updated my code to use these settings, will report back soon. All looks well for now. THANKYOU!
thanks everyone!!!
Add a reply
Sign up and join the conversation on Discord