Find answers from the community

Updated 11 months ago

hello @Logan M ! I'm trying to be an

hello @Logan M ! I'm trying to be an early adopter of the new Nomic AI embedding model but I seem to be running into an error. Unfortunately I cannot use their API so it must run locally, I am embedding around 100k nodes on a T4 machine into a Weaviate vector db.

I am defining the model like this:
Plain Text
model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

embed_model = HuggingFaceEmbedding(
        model=model,
        tokenizer=tokenizer,
        max_length = 2048
        )


Trying to keep a short index batch size:
Plain Text
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context, show_progress=True, insert_batch_size = 512)


This is the error I'm getting:
Plain Text
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 14.58 GiB of which 45.56 MiB is free. Including non-PyTorch memory, this process has 14.53 GiB memory in use. Of the allocated memory 14.08 GiB is allocated by PyTorch, and 335.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)


Any idea? πŸ™‚
L
J
b
19 comments
try changing the embed batch size

embed_model = HuggingFaceEmbedding(..., embed_batch_size=2)
I actually haven't tested this model yet either lol
You might also want to change the pooling
embed_model = HuggingFaceEmbedding(..., pooling="mean")
I need to make that more automatic
yeah so batch size = 2 doesnt work, same error
same with the pooling
pooling is more for the actual result
does batch size of 1 work?
if not, I think the model is just too big for your GPU lol
that would be sad, I mean it's a T4 w 16GB VRAM
You can also lower the max length too
1024 is probably good enough
ok so batch size 1 seems to be ok (even though gonna take probably 2 centuries to finish)
batch size 1 and length 2048
ok i said yey too early, crashed after about 10k nodes πŸ˜„
trying 1024 length + batch size 1
small update - i was able to do it with 1024 length and batch size of 1 it did take a while to do that πŸ™‚

a separate question - I'm using the Sentence Window thing in my system and generally the embedding is done on small chunks of texts, could it be that's why so far mpnet_base_v2 is literally the 🐐 among the embeddings that I've tried? BGE / OpenAI / Jina / Nomic they are absolutely terrible compared to mpnet_base_v2. Does it make sense what I'm thinking? If yes, then I guess the focus is to try different embeddings that were trained on small corpus lengths (not sure what's the word for it)? Would you recommend something specific?
I'm experiencing a similar issue, but I'll wait until the new release this week before I post my own questions.
Add a reply
Sign up and join the conversation on Discord