Find answers from the community

Updated 3 months ago

Quick question Can someone clarify what

Quick question - Can someone clarify what num_outputs refers to in the PromptHelper class? Is that the number of tokens allowed in the output? The name is throwing me off
P
L
26 comments
Also the only place I can see it used is here:
Plain Text
result = (
    self.max_input_size - num_prompt_tokens - self.num_output
) // num_chunks

but I don't get how that's useful...
Num outputs is used to make sure the inputs to the LLM are small enough to generate num_output tokens
With models like openai, the input and output are connected

Tokens are generated one at a time, each one added to the input before generating the next token
So llama index needs to make sure the prompts to the LLM are small enough to have room to generate a response
hmm ok but then what does num_outputs refer to? Like how do I calculate what to put there?
It refers to how much the LLM might generate
By default, openai has max_tokens of 256, so num_output is also 256
I should also mention these numbers are all measuring tokens lol
ok gotcha so to clarify is num_output just the other side of max_input_size? Like maybe num_output should be called max_output_size?
and is 256 for an ADA model? So a GPT 3.0 model would be 4096?
It's not quite the other side. If I set num_output to 1000, then it takes max_input_size (4096 for openai) and tries to ensure that the prompts sent are at most 4096-1000=3096 tokens long
The term max_output_size kind of makes sense for a user's perspective yea
Ok so num_output is the remaining tokens available after you account for prompt length
Yea that's it πŸ’ͺ
Is there a reason that num_output can't be calculated automatically? I'm just wishing it was optional because I don't really care how long the output is. I'd like it to be as long as it needs to be.
So, let's say you retrieve your top k nodes, and you have compact mode on.

How much text do you put in each call to the LLM as context? You can calculate the length of the prompt template and query tokens, and that's the minimum length since those can't be trimmed

But the context can be trimmed. So llama index uses max_input_size and num_output to figure out how big each piece of context should be
So... I don't see how this can be automatic πŸ˜… the bigger num_output gets, the less context you can include in each LLM call
hmmm ok yes I think I follow that
If you need to change it because your reponses ate getting cut off, usually somewhere around 512 is a good size for most use cases
Just make sure max_tokens and num_output are the same value and you are good to go
do you mean max_input_size?
or is there a max_tokens value I can set somewhere? In which case, not sure what that refers to haha
oh it's passed to the llm class
nevermind I got it πŸ‘ πŸ˜„
:dotsHARDSTYLE:
Add a reply
Sign up and join the conversation on Discord