Find answers from the community

Updated 3 days ago

Token

ALSO its driving me CRAZY that the llama index tokenizer only ENCODES and has no decode im going to scream lol.
L
S
9 comments
Why do you need decode? Its just for token counting πŸ˜…
how do I truncate a document? I can use Settings.encoder to detect a document is too large. how to truncate?
unless I entirely remove the truncation feature from my class. πŸ™‚
which I could. or let people pass their own encoder in. which I think is the best option. and of course you only need to pass an encoder if you are using the truncation feature. otherwise it doesnt amtter
What/where are you trying to truncate? Generally it's either handled for your automatically, or you've already chunked your data to a size
the full documents, which can be over the 200k/128k context limits easily
i give the user the option to warn, throw an error, or ignore.. or truncate first or truncate last
but maybe you're right, the best code is no code
Yea tbh i would be using a splitter and setting the chunk size as needed
I might just delete all my truncation code πŸ™‚ if your documents are too large its the users fault! hah!
Add a reply
Sign up and join the conversation on Discord