Find answers from the community

Home
Members
ShantanuNair
S
ShantanuNair
Offline, last seen 3 months ago
Joined September 25, 2024
Hey Jerry, any findings on the summary nodes not containing extra_info of the nodes they were created from?
4 comments
S
j
Saw the refactored composability API, looks great! But I had a question - previously if I had a vector index on top of multiple listIndex (which are each a document), then I could query just the vector index and stop at the listIndex summaries by avoiding stating mode=recursive (as we discussed in a thread here). But now it seems like that may not be an option, or I don't know how to do that with the current API.

set_text nodes created on subindices don't automatically get the index's metadata?
I'm not 100% sure but I thought that was the case previously and looks like now the summary nodes don't carry any extra_info
32 comments
S
j
Another q - I am composing a simple vector index over N ListIndices each which are a document. Now I have done the set_text for each listIndex to the tree_summarized summary of each of their respective listIndices. My understanding is that if I am to query on the vector index, the relevant documents (ListIndices) should be selected, and then their chunks (nodes) should be passed into the prompt, correct? If that is right, I am somehow seeing that the traversal stops at the vector index, and the summary of the list index is what's passed into the query. Is my understanding correct, or am I doing something wrong?
44 comments
j
S
What would be the difference between using a ListIndex with query mode embedding vs a SimpleVectorIndex, with both using the same topK? ListIndex would run embeddings on text at query time whereas the vector index would run a similarity on already stored embeddings at query time?
3 comments
j
S
Hey @jerryjliu0 , fantastic work! I have read through all of the docs, and tried getting the different query modes, response modes, and the overall style of using gpt index down. I'm having a hard time figuring out what considerations are needed wrt cost/performance/detail when it comes to choosing an index and what style of composing is needed if at all.

My use case is I have 100s of long meeting transcripts, and am trying to run two types of queries - one which runs only on a given transcript, and the second which runs across transcripts. I decided to start with making a Listindex out of each transcript document, and doing set_text on each with tree_summarize. Now I'm not sure if that's better than making it a tree, and I'm now sure at all how to think about the impact if I change it to a tree index.

I then build a GPT SimpleVectorIndex over these document ListIndexes (this I understand uses openai embeddings) and save it to disk.

Finally, for querying across transcripts, I query the simplevectorIndex with say "How has the speakers perception about the topic changed through the meetings?" and with response_mode tree_summarize and topK=50.

If you or others, could help me out with understanding what my considerations here should be wrt choosing indices and composing styles, I would be so grateful, and would love to pay it forward with helping the community here as well 🙂
30 comments
j
S