Find answers from the community

Updated 3 months ago

Hmmm I don t think anything like that

Hmmm, I don't think anything like that exists right now. Or at least nothing that isn't super hacky lol
k
L
24 comments
Ook so basically choosing a vectorindex in the beginning is vry important decision lol?
And also just curious, I am currently using the simplevectorindex, does llama-index perform the query similarity search and retrieval manually under the hood or is it using some other techniqes?
Because my model is a bit slow for response so just wondering what the key components are so that I can speed up
Well kinda I suppose lol. But thankfully embeddings are very cheap to calculate 🫠

Under the hood it's just using cosine similarity, nothing too special.

In my experience the slowest part by far is the LLM calls
ONe issue I found is the LLM model. GPT3.5 seems to be sloe especiialy when the server is ovrloaded
Nice!! So when would we use things like faiss and pinecone?
And would using them in llama-index use their retrieval methods or still the llama-index sosine search?
Only if you have an extreme amount of embeddings (like you index.json is over 5GB or something silly), or maybe you want something where it's a little easier to manage the embeddings per user (if applicable)

Using another vector store would use their retrieval methods. So it calculates an embedding for the query text and then ships that off to the vector store to do its thing
And you can still save the index as json right?
irrespective of the vectorindex used
"maybe you want something where it's a little easier to manage the embeddings per user (if applicable)" What do you mean by this?
Well, with third party vector stores everything is actually in the vector store itself, so nothing to save or load really

When you initialize the index, you can connect to the existing documents and just pass in an empty list of documents
Same index stores have ways of separating the data, I.e. per user in your application maybe.

You could totally do this yourself too with the simple vector index, just separating the index json per user
Ohh so its kinda like stored in their server?
And just a final question, when the chunks are stored as index, isnt the embedding added in as a type of metadata/look-up data. So that everytime you retrieve a chunk, its not "de-embedded" or anythin right? ahha
Yessir, at query time, only the query text needs to be embedded, the rest are saved
Awesome thank you so much!! This has been a great learning opportunity!!!
one more quick q ahah, what is this async query lol?
And when would it be useful?
It's useful so that the query call doesn't block your application, so you can call it and then fetch the actual answer when it's done

In a server setting this kind of makes sense
Hmm I might need to a deep dive into this aahah. Aplogies if this was a stupid question lol I dont have a background in compsci
Yea no worries! It's pretty non-obvious tbh haha
Add a reply
Sign up and join the conversation on Discord