Find answers from the community

Home
Members
logiclord
l
logiclord
Offline, last seen 5 days ago
Joined January 15, 2025
Undering upserting documents in Qdrant. Suppose I have a google doc (GD) which was already inserted in MongoDB docstore and Qdrant vector store. Now suppose the google doc (GD) was edited and we want to update it in MongoDB docstore and Qdrant vector store. How do we make sure that we do not end up creating duplicate documents.

I heard that we can use doc_id as a field in the metadata for LlamaIndex document and that will help dedupe but how will that work if document is say 1000 pages i.e. the document is broken into multiple nodes. How does doc_id translate to multiple node identifier to identify which node to update in MongoDB docstore and Qdrant vector store?

If we as user should set node_id directly then any guidance into how to generate node_id will be super helpful.
8 comments
L
l
How to update (add or edit docs) an existing index? I am not able to reuse index.

Here is my code for saving data:
Plain Text
email_docs = process_emails_sync(filtered_unprocessed_emails, user)
docstore = MongoDocumentStore.from_uri(uri=LLAMAINDEX_MONGODB_STORAGE_SRV)
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(my_docs)
docstore.add_documents(nodes)
Settings.llm = OpenAI(model=ModelType.OPENAI_GPT_4_o_MINI.value)
Settings.embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)
client = qdrant_client.QdrantClient(url=QDRANT_API_URL, api_key=QDRANT_API_TOKEN)

vector_store = QdrantVectorStore(client=client, collection_name=LLAMAINDEX_QDRANT_COLLECTION_NAME)

index_store = MongoIndexStore.from_uri(uri=LLAMAINDEX_MONGODB_STORAGE_SRV)
storage_context = StorageContext.from_defaults(vector_store=vector_store, index_store=index_store, docstore=docstore)

index = VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True)
index.storage_context.persist()

When I try to load the index using the same storage context as above I get an exception that I need to specify an index_id because a new index is created every time I run the code above. How to pass the index_id to the store so it updates existing index? Please note that I am already using doc_id correctly to ensure upserting of documents.

load_index_from_storage(storage_context=storage_context, index_id="8cebc4c8-9625-4a79-8544-4943b4182116")

I have tried using VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True, index_id="<index_id>") but that approach didn't work.
33 comments
L
l