Find answers from the community

Updated 6 months ago

where is the correct place to print out

At a glance

The community member is asking about the correct place to print out the final prompt that is being sent to the large language model (LLM), and whether they need to leverage the callback manager in the RetrieverQueryEngine when assigning the vector_query_engine variable. They provide some example code related to querying an index and creating a query engine.

In the comments, another community member suggests setting the global handler to "simple" at the top of the code, which allows the community member to get printouts of the query trace. The community member confirms that this worked, and asks if they need to set the callback managers everywhere using the Settings with the global handler, to which another community member responds "nope, thanks".

There is no explicitly marked answer in the post or comments.

where is the correct place to print out the final prompt that's being sent to llm? would i need to leverage the callback manager in the RetrieverQueryEngine when assinign the variable vector_query_engine any good examples?

Plain Text
    
def _query_index(self, query_engine: RetrieverQueryEngine, query: str) -> RESPONSE_TYPE:
        embedded_query = Settings.embed_model.get_text_embedding(query)      
        response = query_engine.query(QueryBundle(query_str=query, embedding=embedded_query))
        
        return response

def _create_query_engine(self) -> RetrieverQueryEngine:
        vector_index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store, 
                                                   embed_model=Settings.embed_model)
        
        vector_retriever = VectorIndexRetriever(index=vector_index, similarity_top_k=5)
        
        vector_query_engine = RetrieverQueryEngine(
            retriever=vector_retriever,
            response_synthesizer=self.response_synthesizer,
            node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.50),],
        )
        
        vector_query_engine.update_prompts({"response_synthesizer:text_qa_template": self.qa_prompt_tmpl})
                
        return vector_query_engine

def query_rag(self, query: str) -> Dict[str, Any]:
        vector_query_engine = self._create_query_engine()

        response = self._query_index(query_engine=vector_query_engine, query=query)
L
s
5 comments
Plain Text
from llama_index.core import set_global_handler
set_global_handler("simple")
put that at the top of your code
I can get print outs of:

Plain Text
**********
Trace: query
    |_CBEventType.QUERY -> 9.163249 seconds
      |_CBEventType.RETRIEVE -> 0.006792 seconds
      |_CBEventType.SYNTHESIZE -> 9.156255 seconds
        |_CBEventType.TEMPLATING -> 6e-06 seconds
        |_CBEventType.LLM -> 2.308461 seconds
        |_CBEventType.TEMPLATING -> 1.1e-05 seconds
        |_CBEventType.LLM -> 1.531016 seconds
        |_CBEventType.TEMPLATING -> 7e-06 seconds
        |_CBEventType.LLM -> 1.433674 seconds
        |_CBEventType.TEMPLATING -> 6e-06 seconds
        |_CBEventType.LLM -> 1.585802 seconds
        |_CBEventType.TEMPLATING -> 7e-06 seconds
        |_CBEventType.LLM -> 2.289968 seconds
**********
That worked. Do I need to set the callback managers everywhere using Settings with that global handler then?
Add a reply
Sign up and join the conversation on Discord