Monitoring Runtime Context and Memory Usage

Pragyan Mohapatra · 2024-11-18T04:09:53.598Z

@Logan M can I get to know current available context size and memory used and max tokens used at runtime so that when it approaches the limit, I can reset the variables and the chat engine so that it doesn’t reach the limit and break with error

You can check the total available context length for openAI model here: https://github.com/run-llama/llama_index/blob/aa1f5776787b8b435f89d2c261fd7ca8002c1f19/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py#L39

For chekcing what is the token remaining you can add instrumentation module: https://docs.llamaindex.ai/en/stable/examples/instrumentation/instrumentation_observability_rundown/

take the LLM event and extract tokens and than update the final token lenght based on the new values.

Find answers from the community

Monitoring Runtime Context and Memory Usage