Token

Question

ALSO its driving me CRAZY that the llama index tokenizer only ENCODES and has no decode im going to scream lol.

Logan M · Answer

Why do you need decode? Its just for token counting 😅

SnowBloom · Answer

how do I truncate a document? I can use Settings.encoder to detect a document is too large. how to truncate? unless I entirely remove the truncation feature from my class. 🙂

SnowBloom · Answer

which I could. or let people pass their own encoder in. which I think is the best option. and of course you only need to pass an encoder if you are using the truncation feature. otherwise it doesnt amtter

Logan M · Answer

What/where are you trying to truncate? Generally it's either handled for your automatically, or you've already chunked your data to a size

SnowBloom · Answer

the full documents, which can be over the 200k/128k context limits easily

SnowBloom · Answer

i give the user the option to warn, throw an error, or ignore.. or truncate first or truncate last

SnowBloom · Answer

but maybe you're right, the best code is no code

Logan M · Answer

Yea tbh i would be using a splitter and setting the chunk size as needed

SnowBloom · Answer

I might just delete all my truncation code 🙂 if your documents are too large its the users fault! hah!

Find answers from the community

Token