Find answers from the community

Updated 3 months ago

Embeddings

I'm trying but getting error
ValueError: shapes (1536,) and (768,) not aligned: 1536 (dim 0) != 768 (dim 0)
L
W
A
27 comments
Did you already create an index with openai embeddings and then switch to huggingface?

The embeddings from different models can't mix, or you get this error πŸ‘€
Just need to make sure you start with a new index when switching embed_model
No! But found an interesting case. Not sure why it is happening as of now but here it is

Combination of huggingface embedding and OpenAI for response generation works when I'm not storing the generated indexes and then picking out the indexes from the storage

Plain Text
#custom knowledge
from llama_index import (
                        GPTVectorStoreIndex, LangchainEmbedding,
                        StorageContext, load_index_from_storage, SimpleDirectoryReader
                        
                        )
from llama_index import ServiceContext, LLMPredictor
import os
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512,embed_model=embed_model)

documents = SimpleDirectoryReader(input_files=["path to one doc"]).load_data()
open_index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = open_index.as_query_engine(similarity_top_k=3, service_context=service_context)

response = query_engine.query("summarize this document")
print(response.response)


This works!!!
But at the same time
If I do
  • Create indexes
  • store them
  • then load them
and use the loaded indexes to query

it gives me ValueError: shapes (1536,) and (768,) not aligned: 1536 (dim 0) != 768 (dim 0)

here's the code for the same

Plain Text
#custom knowledge
from llama_index import (
                        GPTVectorStoreIndex, LangchainEmbedding,
                        StorageContext, load_index_from_storage, SimpleDirectoryReader
                        
                        )
from llama_index import ServiceContext, LLMPredictor
import os
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))


embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512,embed_model=embed_model)

documents = SimpleDirectoryReader(input_files=["path to a doc"]).load_data()
open_index = GPTVectorStoreIndex.from_documents(documents, service_context=service_context)
open_index.storage_context.persist(persist_dir="./hff_Storage")

storage_context = StorageContext.from_defaults(persist_dir="./hff_Storage")

index = load_index_from_storage(storage_context=storage_context)

query_engine = index.as_query_engine(similarity_top_k=3, service_context=service_context)

response = query_engine.query("summarize this document")
print(response.response)
I know I'm loading indexes even though I have it already!

Case is this combination is not working when I'm loading the indexes from the storage context
Could you try this at your end once and let me know if it works or not!
Thanks
What are the advantages of this bundle and under what data? Thank you.
Its just I'm thinking that my entire data will not be passed to OpenAI services.
In actual setup, I'm going to have around 150 docs on which I will create the indexes. So just seeing if only the response generation part is passed to OpenAI and not the entire vector generation part.
And what about the consumption of tokens is acceptable?
While generating the response? Yes as it will not be that much in comparison to while generating embeddings
Yes, when generating the response. Well, with attachments, are there any options to minimize the expense? I'm looking for a middle ground between the cost of the answer, the quality of the answer and the cost of generating attachments πŸ™‚
You could use huggingface models if you want to reduce completely on the token consumption side. But openai responses are considered best among all the llm out there.


That is why I'm trying to see if combining hf and openai works for me or not
I will test, but the quality of the answers is very important. Otherwise, why all this. I use: nodes, gptvectorstoreindex and SentenceEmbeddingOptimizer.
Sbert embeddings that I'm using is also great. Yes better responses is the key but the data I'll be working with will be huge in numbers. So i just want to check if this can work for me or not
Do check both the scenarios. My issue was once we store the embeddings and then pick it up it fails while generating response


https://discord.com/channels/1059199217496772688/1109093304575983738/1110139260142620742
Yea this should definitely work. Let me see if I have time to debug later today πŸ’ͺ
Guys, can you throw off the working code later? Thanks
Hey, did any of you were able to try the above code?
Hi. I haven't had time yet.
I just tried now with a very minimal example, but I wasn't able to reproduce πŸ‘€

Plain Text
>>> from langchain.embeddings.huggingface import HuggingFaceEmbeddings
>>> from llama_index import GPTVectorStoreIndex, ServiceContext, StorageContext, load_index_from_storage, Document
>>>
>>> embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
>>> service_context = ServiceContext.from_defaults(embed_model=embed_model)
>>> doc = Document("this is a document lol!")
>>>
>>> new_index = GPTVectorStoreIndex.from_documents([doc], service_context=service_context)
>>> new_index.as_query_engine().query("hello world")
Response(response='\nHello World!', source_nodes=[NodeWithScore(node=Node(text='this is a document lol!', doc_id='7cace66a-1302-41ef-8fa6-98e6cf6feac3', embedding=None, doc_hash='57e74d18803a15a129af5ba1f71081081f50b4e7007689bd4205c0be84063aad', extra_info=None, node_info={'start': 0, 'end': 23}, relationships={<DocumentRelationship.SOURCE: '1'>: '1852954d-a584-4c8c-8f6d-201e901b0765'}), score=0.1624280677241592)], extra_info={'7cace66a-1302-41ef-8fa6-98e6cf6feac3': None})
>>>
>>> new_index.storage_context.persist(persist_dir="./newer")
>>> 
>>> newer_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./newer"), service_context=service_context)
>>> newer_index.as_query_engine().query("hello world")
Response(response='\nHello World!', source_nodes=[NodeWithScore(node=Node(text='this is a document lol!', doc_id='7cace66a-1302-41ef-8fa6-98e6cf6feac3', embedding=None, doc_hash='57e74d18803a15a129af5ba1f71081081f50b4e7007689bd4205c0be84063aad', extra_info=None, node_info={'start': 0, 'end': 23}, relationships={<DocumentRelationship.SOURCE: '1'>: '1852954d-a584-4c8c-8f6d-201e901b0765'}), score=0.1624280677241592)], extra_info={'7cace66a-1302-41ef-8fa6-98e6cf6feac3': None})
>>> 
Yes it worked, And i got the missing part in my code as well, While loading the vectors I was not passing the service_context
Thanks a ton @Logan M
Nice, glad it works! :dotsCATJAM:
In case of default I guess It will work without passing it but since I'm using ChatOpenAI, it was required
Did your code work ? πŸ™‚
Had to pass service context while loading from storage if we are not using default service context
Add a reply
Sign up and join the conversation on Discord