Return direct

At a glance

A community member is using ReactAgents and has encountered an issue with the "return_direct" argument. They can see the final answer in the backend, but the answer is being slowly streamed to the front-end, even though it's the same answer. Other community members suggest that this is the expected behavior when using "return_direct" and streaming the chat. They provide some potential solutions, such as speeding up the dummy stream or returning the full response instead of streaming it.

Useful resources

BBo

Hi all, I was wondering if some of you worked with ReactAgents. I have tried the return_direct argument. In the backend, I can see the "Observation" from the agent that contains the final answer. I have then to wait to have the answer streamed in the front (the exact same answer). Do you understand why?

Plain Text

 top_level_sub_tools = [
    QueryEngineTool.from_defaults(
        query_engine=qualitative_question_engine,
        name="qualitative_question_engine",
        description="""
            A query engine that can answer qualitative questions about documents
            """.strip(),
        return_direct=True,
    ),
    ]

    chat_engine = ReActAgent.from_tools(
        tools=top_level_sub_tools,
        llm=chat_llm,
        chat_history=chat_history,
        verbose=True,
        callback_manager=Settings.callback_manager,
        max_function_calls=1,
    )

5 comments

LLogan M

That's exactly what return direct does. When that tool is called, it's output is returned directly to the user.

It has to be streamed, because you are calling stream chat

BBo

Thanks for your answer, yes I'm using astream_chat, I might misunderstand something, the answer is already received in the back (in Observation), and then the streaming in the front is kind of slow (3 tokens/s), I'm using SSE, the same code than in sec-insights

LLogan M

Yea probably this sleep for the dummy strwam should be faster
https://github.com/run-llama/llama_index/blob/723c2533ed4b7b43b7d814c89af1838f0f1994c2/llama-index-core/llama_index/core/chat_engine/types.py#L92

But also, the response is technically fully there, you could just check for it and return the whole thing or fake-stream it yourself

BBo

Thanks for pointing me in the right direction, I'll have a look at it!

BBo

I've tried something like that that is working to patch the delay.
Maybe not the most elegant way but it works to make some tests.

Add a reply

Find answers from the community

Return direct