Find answers from the community

Updated 12 months ago

Why the RedisVectorStore adds a prefix

At a glance
Why the RedisVectorStore adds a prefix to the vectors?
W
L
L
30 comments
See, my problem is that if the index is a DocumentSummaryIndex.from_documents() that gets saved in the RedisVectorStore and RedisIndexStore the load_index_from_storage() fails due to the added prefixes
Not sure how to solve it
In this cases the index is made from the vector store, and is a VectorStoreIndex
Plain Text
import redis

from llama_index import (
    StorageContext,
    DocumentSummaryIndex,
    load_index_from_storage,
    ServiceContext,
)

from llama_index.storage.docstore import RedisDocumentStore
from llama_index.storage.index_store import RedisIndexStore
from llama_index.vector_stores import RedisVectorStore
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.schema import Document


REDIS_DB_HOST = "localhost"
REDIS_DB_PASSWORD = ""
REDIS_DB_PORT = 6380

redis_client = redis.Redis(
    host=REDIS_DB_HOST, password=REDIS_DB_PASSWORD, port=REDIS_DB_PORT, db=0
)


docstore = RedisDocumentStore.from_redis_client(
    redis_client=redis_client,
    namespace="Fail_doc_store",
)
index_store = RedisIndexStore.from_redis_client(
    redis_client=redis_client,
    namespace="Fail_index_store",
)
vector_store = RedisVectorStore(
    redis_url=f"redis://:{REDIS_DB_PASSWORD}@{REDIS_DB_HOST}:{REDIS_DB_PORT}/0",
    index_name="Fail_vector_store",
    index_prefix="llama",
    overwrite=True,
)
storage_context = StorageContext.from_defaults(
    docstore=docstore, index_store=index_store, vector_store=vector_store
)

service_context = ServiceContext.from_defaults(
    llm=<SOME_LLM_MODEL>,
    embed_model=HuggingFaceEmbedding(
        model_name="intfloat/multilingual-e5-large",
    ),
)

document = Document(text="This is a test document", id="test_id")
index = DocumentSummaryIndex.from_documents(
    documents=[document],
    storage_context=storage_context,
    service_context=service_context,
)
index_id = index.index_id
index2 = load_index_from_storage(
    index_id=index_id, storage_context=storage_context, service_context=service_context
)
query_engine = index2.as_query_engine().query("This is a test query")
In that example the llm of the service context must be change to a custom llm
@WhiteFang_Jr @Logan M I hope this example helps. Should i open a github issue?
Plain Text
Traceback (most recent call last):
  File "/Users/HASHKELL/SHERPAS/Funds/fundssociety-rag/recreable_fail.py", line 56, in <module>
    query_engine = index2.as_query_engine().query("This is a test query")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/core/base_query_engine.py", line 40, in query
    return self._query(str_or_query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/query_engine/retriever_query_engine.py", line 171, in _query
    nodes = self.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/query_engine/retriever_query_engine.py", line 127, in retrieve
    nodes = self._retriever.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/indices/document_summary/retrievers.py", line 176, in _retrieve
    node_ids = self._index_struct.summary_id_to_node_ids[summary_id]
Maybe im confused, it's adding a prefix to the IDs? Or to what?
Its adding a prefix to the IDs
But the same prefix is not added to the node_ids in the docstore
Hmm seems like a bug. It should be removing that prefix when retrieving the nodes then
Well, maybe. The problem is that the prefix is not added to the nodes because they are made form the DocummentSummaryIndex
You might be the first person to use redis with the doc summary index lol
But if the Index is build like an VectorStoreIndex.from_vector_store i think they do get added
Tbh I'm open of changing it, but i think that i can't achieve the same results only using a vector_store manly because there is no documment_summary transformation when doing the nodes.
If something is bad practices let me know
I think in the query() method of the RedisVectorStore, it should remove the prefix from the ids. Otherwise, the ids won't line up with anything else.

It will work in a standalone vector index, since it doesn't have to use the resulting ids anywhere
"It will work in a standalone vector index, since it doesn't have to use the resulting ids anywhere", why?
I can change it and make a pull request but I wasnt sure if it would breake another index
Because in a standalone vector index, once the nodes are retrieved, they are just sent to the LLM, there's no lookup anywhere else like the doc summary index, since we already have the node to give to the llm.

I'm thinking right here, we need to modify node.id_ to remove the prefix?
https://github.com/run-llama/llama_index/blob/dcef41ee67925cccf1ee7bb2dd386bcf0564ba29/llama_index/vector_stores/redis.py#L291
Unsure without testing lol
Give me a sec to debugg and make sure
Here seems a better option
PR done. Thank you very much @Logan M and @WhiteFang_Jr
Add a reply
Sign up and join the conversation on Discord