Find answers from the community

Home
Members
AndreaSel93
A
AndreaSel93
Offline, last seen 3 months ago
Joined September 25, 2024
Hey guys, since similarity top k does not always present the actual most relevant data, I was thinking to do the following:

  • Loop using the n top k, but using k=1 for each llm_call, asking if the context is really pertinent to the question (answers must be 0 if no and 1 if yes (to keep it fast), like a classification problem);
  • keep the nodes where the context is relevant and merge them (accurately separated)
  • using the qa and (only if the chunk is long enough) the refine template only for the relevant nodes
Im following the openai best practices for complex reasoning problems where they suggest to split the problem in smaller problems.

What do u think? In this way i should obtain more flexibility and better responses. However im afraid of the overall time of execution.
Do u have suggestions?
1 comment
A
In my case it finds:

1) doc “xyw” paragraph “123”
2) doc “xyw” paragraph “123”
3) doc “xyz” paragraph “456”

Wtf is going on😅
5 comments
A
L
Huge work! Do you think that the this new lower level API (node) also affects a big text db or only json, images etc? Can you better explain the implications?
10 comments
A
j
Hey! Any way to keep concise and short answer without truncating them? Max_token just makes a truncation. Im using GPTSImpleVectorIndex
2 comments
A
r
Is it possible with a pinecone index or only GPTSimpleVectorIndex?
7 comments
A
j
L
Hey! Is it possible using a langchain vectorstore agent which takes as input a GPT vector store index?
7 comments
j
A
Does this regard only the gpt simple vector index or also pinecone etc?
2 comments
A
j
Hey! I saw the new release about the expansion of pinecone index. It says that a single pinecone index is sharable among multiple vector indices. Anyone able to explain more?
4 comments
j
A
Ah ok…i think openai embeddings are the ones with the best performance, no?
11 comments
L
A
Hey. How does Llama Index split long texts? Eg langchain provides some features (textsplitter, recursiveTextSplitter, SpacySplitter and so on). What about Llama Index?
1 comment
j
Another Q! I have 1k docs ingested. I know that the response of my query is in more than 1 doc, but specifically in one document. However when I look for it in get_formatted_sources i see that’s in 14th position (ordered by similarity, that imo works poorly). Except for writing better the query, which other methods may I apply to get that response without put similarity top k with large numbers?
1 comment
L
Hey! Anyone knows how can I load my data from pinecone index? To be more specific, I’m trying to keep all in cloud and without saving all my docs. So id like to get vectors directly
2 comments
A
L
Same! Over a minute for a response with 1/2 nodes passed. I have GBs of data and I used GPTSImpleVectorIndex so far. Today I initialized a Pinecone index to check performances. I was also wondering if RAM could be an issue since the index is quite heavy to keep it in memory. Glad to hear someone else with same issues. Lets keep each other updated on improvements!
36 comments
A
R
L
Hey! Very important to me: Is it possible to get a portion of the embeddings of an already built simple vector index?
4 comments
L
A
Anyone facing speed issues? Is this directly proportional to the index size? (Larger size=slower response?)
6 comments
j
A
do you suggest to edit some of llama index prompts to better adapt them for different use cases?
1 comment
j
Hey guys, hugely important for not wasting money, maybe I lost it, but when i fine tune an embedding model following the llama index guide, how can I save it? It’s my first fine tuning so pretty ignorant. Thx in advance!
4 comments
L
A
Or i can use the node parser through the service context?
12 comments
L
A
Hey again! Just a simple question: when I use davinci I can set max_tokens = -1 and this returns complete responses. However, when I use gpt turbo (3.5) i cannot use -1 as max_tokens so i should put 4096 or whatever. The problem is that in this way i get responses supposed to be long and instead they are truncated. How can I solve this? Thank you!
7 comments
L
A
You have to pass your own function instead of the lambda where you can print or store somewhere all the info you need. Now i don’t have the link but @Logan M has an example!
1 comment
L
I don’t know, but below my code, hope could be helpful:
pinecone_api_key = “”
pinecone.init(api_key= pinecone_api_key, environment=“”)
pinecone_idx = pinecone.Index(<name_of_your_project>)

And then define the GPTPineconeIndex
1 comment
M
Is it possible to print the response.source_node using the llama_index as a tool in a langchain agent?
5 comments
L
A
A Q not related to the new version: do the QA and REFINE prompts (also the similarity top_k) work with langchain agents using llama index as tool? or with GPTIndexChatMemory?
8 comments
L
A
I’m already obtaining high similarities (0.75+ for the first 10/20 nodes).
The difference is that i’m adding a llm call. So instead to use the top k to create the responses, i’m using that for selecting like 10-20 nodes. Than I let gpt-turbo decides if each one is truly relevant (this is the key, imo i should obtain some not relevant nodes even though the similarity is high). If relevant i merge them and I use the usual llama index solution.

Do you think that i can incorporate all of that in the qa and refine prompt?
My use case is not simple as “who wins the 2022 world cup”.
2 comments
A
L
Guys a question: why when I use langchain for QA based on a Pinecone vectorstore I hit immediately the model's maximum context length and why when I use gpt_index for querying the same vector_store this doesn't happen? Which kind of magic does gpt_index use? ahahah
4 comments
i
h
A
L