Find answers from the community

Updated 2 years ago

Saw the refactored composability API

At a glance
Saw the refactored composability API, looks great! But I had a question - previously if I had a vector index on top of multiple listIndex (which are each a document), then I could query just the vector index and stop at the listIndex summaries by avoiding stating mode=recursive (as we discussed in a thread here). But now it seems like that may not be an option, or I don't know how to do that with the current API.

set_text nodes created on subindices don't automatically get the index's metadata?
I'm not 100% sure but I thought that was the case previously and looks like now the summary nodes don't carry any extra_info
j
S
32 comments
you should still be able to do that by just doing vector_index.query! So recursive calls would be for if you actually defined a ComposabilityGraph(vector_index).

re: "set_text for nodes created on subindices don't automatically get the index's metadata", i'm actually not entirely sure what you mean. could you clarify?
1) So previously my indices were composed as a VectorIndex over N ListIndices. Each ListIndex was it's own transcript/document. Each one had it's summary set. But if I'm trying to do that with just a vector index, how do I set the summaries? Create a listIndex, then summarize on it and add the summary as an entry to the vectorindex? But then that wouldn't save the listindex.
2) Again, with the previous example in mind, where I had a vectorIndex over multiple ListIndices: I would run
Plain Text
listIndex1.set_text(listIndex1.query(
            "Summarize this transcript:",
           response_mode="tree_summarize"
        ).response)

for each listIndex. That would make
it so that the vector index used the summaries of the ListIndex to decide whether to traverse it or not. Now, previously, if those ListIndex transcripts had extra_info, such as say a meeting_id, I believe that would also be included as extra_info in the summary node.
If now you ask for the source_nodes, you can see that the summary nodes (which were created from running set_text), have no extra_info, and only the leaf nodes do. Previously, the summary nodes created from set_text would show up with the same extra_info as the document they were summarizing.
Let me know if the language/terms I'm using are making it unclear.
@ShantanuNair re (1): the process for composing indices is largely the same as before. e.g.
Plain Text
index1 = GPTListIndex([doc1])
index2 = GPTListIndex([doc2])
index1.set_text("...")
index2.set_text("...")
index = GPTSimpleVectorIndex([index1, index2])

the only difference is now you'd do something like graph = ComposableGraph(index) and use this graph object for querying.

Re: (2) hmm interesting. you're basically saying that the source nodes corresponding to the summary nodes don't have extra_info right? it's possible this is a bug, i can take a look
1) I see. So if I wanted to get the same functionality as previous, where I don't want to set mode='recursive', that style of querying is still the same? Sorry I seemed to have misunderstood that the style of querying has changed
2) Yes, that's what I'm saying. I'll post an example log here in a bit.
Plain Text
response = graph.query(
    "What did the author do growing up, before his time at Y Combinator?", 
    query_configs=query_configs
)
That's the relevant bit I was looking at:
Apologies if I'm misunderstanding.
Plain Text
query_configs = [
        {
            "index_struct_type": "simple_dict",
            "query_mode": "default",
            "query_kwargs": {
                "similarity_top_k": 20,
                "response_mode": "tree_summarize"
            }
        },
        {
            "index_struct_type": "list",
            "query_mode": "default",
            "query_kwargs": {
                "response_mode": "default"
            }
        }
    ]
    if(queryOptions['deep'] == True):
        print("Deep query")
        response = higherIndex.query(query, mode='recursive', query_configs=query_configs, verbose=True)
    else:
        print("Shallow tree summarize query") # Traverses and returns summary nodes. No further traversal.
        response = higherIndex.query(query, response_mode='tree_summarize', verbose=True, similarity_top_k = 20)
Here, the deep query passes in recursive as query mode and a query_configs
but if I want a shallow query I just pass in tree_summarize as response_mode without a query_configs.
The shallow query stops at the summary nodes, but doesn't traverse deeper into the listnodes is useful, and the pattern I am trying to attain with the new graph querying.
yeah what i meant is if you want shallow querying, just use the index directly, you can still do higherIndex.query without mode="recursive"
Ahh I understand.
Thank you SO much.
This graph pattern makes a lot more sense
Here's an example for 2)
It's a truncated part of the source node output containing the ListIndex node and the summary node
You see the summary node has no extra_info. The node it's summarizing, does.
@ShantanuNair i took a deeper look. i'm adding a set_extra_info to the index (in addition to set_text). You can take a look at the screenshot i attached as an example.

If you do this, then you effectively set extra_info on the index similar to how you'd set extra_info on a Document. If you build a parent index on top of these subindices, the subindex metadata will be propagated to all nodes derived from it. Hope this more explicitly solves your use case
Attachment
Screen_Shot_2023-02-01_at_12.32.45_AM.png
@jerryjliu0
If you build a parent index on top of these subindices, the subindex metadata will be propagated to all nodes derived from it.
Just making sure I understand correctly - a summary node isn't one of those 'derived nodes', correct? Since it has to have it set explicitly via set_extra_info. Could you give me an example of what you mean by a derived node/a node that would have the subindex metadata propagated to it?
Previously the listindex documents which comprised of subindices would each have a meeting_id. That meeting_id would be attached to the summaries, further up the node tree. And then finally the vector index would use those summary nodes with meeting_id metadata
I'm unsure if you're now saying that the child nodes, below the summary node would copy over the metadata from the parent. Say, if I had a GPT Tree Index per doc instead of List Indices and then summarized on that, then the child nodes would get the metadata of the summary node?
Sorry I can just check via some experiments. Let me do that
Or maybe that could be configurable - which way the extra_info should propagate, be it up or down the nodes. In my case, I want it to propagate up, from leaf nodes, to summary nodes.
Yeah so the thing is, a subindex could correspond to multiple documents, so i figured it made less sense for extra info from an underlying "document" to propagate to the subindex vs. just setting the extra info on the subindex directly.

For your use case, if each subindex does correspond to just one document, you may still need to add the extra step subindex.set_extra_info({"meeting_id": "<meeting_id>"}) even if the underlying document also has the meting_id
That makes total sense, and I too was wondering about the case when a subindex didn't refer to a whole document, but say different nodes with each it's own metadata.
Add a reply
Sign up and join the conversation on Discord