Find answers from the community

Updated 2 years ago

Another q I am composing a simple vector

At a glance
Another q - I am composing a simple vector index over N ListIndices each which are a document. Now I have done the set_text for each listIndex to the tree_summarized summary of each of their respective listIndices. My understanding is that if I am to query on the vector index, the relevant documents (ListIndices) should be selected, and then their chunks (nodes) should be passed into the prompt, correct? If that is right, I am somehow seeing that the traversal stops at the vector index, and the summary of the list index is what's passed into the query. Is my understanding correct, or am I doing something wrong?
j
S
44 comments
are you seeing mode="recursive"?
you need to pass in the query_configs a bit differently as well
Yes, I was doing that.
Like so:
Plain Text
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 20,
            # "response_mode": "tree_summarize"
        }
    },
    {
        "index_struct_type": "list",
        "query_mode": "default",
        "query_kwargs": {
            "response_mode": "tree_summarize"
        }
    }
]
response = higherIndex.query("Was Firefox ever mentioned during these discussions?", mode='recursive', query_configs=query_configs, verbose=True)
This is how I initialized the indices
Plain Text
index1 = GPTListIndex(d1)
index2 = GPTListIndex(d2)
index3 = GPTListIndex(d3)
index1.set_text(
    index1.query(
        "Summarize this meeting transcript, with a brief intro on the participants, followed by a complete summary of everything discussed in their conversation.", 
        response_mode="tree_summarize"
    ).response
)
index2.set_text(
    index2.query(
        "Summarize this meeting transcript, with a brief intro on the participants, followed by a complete summary of everything discussed in their conversation.", 
        response_mode="tree_summarize"
    ).response
)
index3.set_text(
    index3.query(
        "Summarize this meeting transcript, with a brief intro on the participants, followed by a complete summary of everything discussed in their conversation.", 
        response_mode="tree_summarize"
    ).response
)
docIndices = [index1,index2,index3,index101,index102,index103]
higherIndex = GPTSimpleVectorIndex(docIndices)
Also, if I'm understanding correctly, the traversal should go through the entire list node's chunks after the vector index returns topK listIndices.
Ooh ok let me take a look, there's a chance this is a bug
Hey @ShantanuNair, i'm taking a look right now. I think you also pasted the outputs in the other thread. Could you repaste the full output of the recursive query with verbose=True? (you can attach as a text file if you wish).

I'm trying to verify if it's hitting the list index, from my testing it seems to be working
Sure, I'll do that right nwo.
Here is the verbose output
and just checking, what is d1, d2, etc? I see it's "buiding index from nodes: 0 chunks"
The script that ouputs the response
could you check to see if d1, d2 have data?
Yes, sure. I'd expect it would, since they are obtaining summaries
from the output it seems like it actually is hitting the list index, but the list index has 0 nodes
my data folder, if you wanted to have a look
one thing you could check is call index1.index_struct.nodes
This is the output of the script, where it builds indices, without the response included.
Okay, let me check that
I only save the higher vectorIndex to disk, and not the ListIndex, I'm not sure if that would affect anything though
oh, do you save to disk and then reload it?
I save just the vectorIndex to disk and reload it.
the output of index1.index_struct.nodes
got it...that might be it
i think i repro'ed your issue
i'll take a look soon
when you save and reload there's something going on with the underlying subindices probably
yeah i just tried saving and reloading, and it's not hitting the subindex
I added the code I'm running here, in case you missed it. I just load the documents, create a ListIndex on each doc, query their summary, set_text for each. Then create a vectorIndex over those. Save the vectorIndex, load it, and then query it with recursive.
thanks for the back and forth! hopefully will have this fixed by tomorrow. In the meantime, hopefully the part where you just build from scratch works (though not ideal from cost perspective).
Awesome, thanks so much. Until I dug in, it seemed like it was working as expected, and quickly.
yep. the main part is i think saving/loading broke it
Is there a way I could keep this summary only retrieval? If I don't want it to traverse into the ListIndices ?
just don't do mode="recursive"
Awesome! Thank you so much.
I can try and have a look at the source as well to debug/help with a PR, but I'm not too familiar with working on Python libs.
no worries! i'll have something by tmrw
@ShantanuNair the fix should be in
Add a reply
Sign up and join the conversation on Discord