Find answers from the community

Updated 5 months ago

Looks like I need hand, if someone so

Looks like I need hand, if someone so wise is around. 😎

I've been screwing up my node filtering all along. The only reason I've gotten such good results is that the queries embed a relevant cue and I'm getting lucky on filtering.

Here's my use case. I want to ingest lots of documents -- estimating close to 300,000 into pgvector. During ingestion, I set a metadata key business_id. I can verify that each node in the table has .metadata['business_id'] set to the correct value.

I need to, at query time, pull only those docs with the specific metadata['business_id'] == some_value the filter the top_k from that set, NOT pull top_k from all nodes and then return those matching. Make sense? I just need a where clause on my SQL query. πŸ™‚
J
L
10 comments
I'm creating a new FilteredQueryEngine that is doing
Plain Text
python        all_nodes = self.vector_retriever.retrieve(query_bundle)
        # Filter nodes by the provided camp_id metadata
        filtered_nodes = [
            node for node in all_nodes if node.node.metadata.get("business_id") == business_id
        ]

in its retrieve. But, this is wrong.
I initially though metadata filters were the solution but am not getting that to work.
When you create the retriever, you can pass in the filters you need
I.e.

Plain Text
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="file_name", value="uber_2021.pdf")]
)
Thanks, @Logan M , I've tried that which is one reason I came here. But, I'll spend a few more hours on it early tomorrow. Appreciate the response. πŸ‘
Interesting. It should be filtering before retrieving, which is what I think you want πŸ‘€
Yep, exactly. Rolling some new POC code to validate.
Aight. So, first problem solved. I renamed my emeddings table and updated my ENV vars. Apparently, llamaindex is prepending a data_ to pgvectorstore table names?
Okay, and second problem figured out. Updated my old FilteredQueryEngine gist: https://gist.github.com/inchoate/fb0e6a2300180afc095da8415c625e9e
Thanks, again, Logan.
Add a reply
Sign up and join the conversation on Discord