Find answers from the community

Updated 3 months ago

hello, i'm facing an issue, i want to

hello, i'm facing an issue, i want to index something like 150 document in a row but this take too long (about 1h or more or never end), is there a way to make the code down here index one document return the status(200) and restart with the following document until all document are indexed?


dossier = requestDTO.Index

# Initialisation des paramètres pour les requètes sur MongoDB Atlas

mongodb_client = pymongo.MongoClient(_mongoURI)
db_name = f"{dossier}"
store = MongoDBAtlasVectorSearch(mongodb_client, db_name=db_name)

storage_context = StorageContext.from_defaults(vector_store=store)

# Création ou mise à jour d'un index à partir de documents dans le dossier 'Sources'1
set_global_service_context(service_context)
documents = SimpleDirectoryReader("./Sources/Zephyr").load_data()
#index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

#while documents:
# index.add_documents([documents], storage_context=storage_context)


responseDTO = IndexCreationResponse.IndexCreationResponseDTO(False, None, "L'index à bien été créé ou a été mis à jour.")

# Terminée, on envoi la réponse définitive

return GenerateIndexResponse(requestDTO, responseDTO), 200
L
K
4 comments
you can incrementally add documents yea

Plain Text
index = VectorStoreIndex([], storage_context=storage_context)
for doc in documents:
  index.insert(doc)


However, you can probably speed up the emebddings step. What embedding model are you using? You can increase the batch size
hello, thank you the problem is fix
thank you for trying help me
Add a reply
Sign up and join the conversation on Discord