Ariel

Integrating openai's audio gpt-4o-audio-preview model with agents

Has anyone tried to use OpenAI's audio gpt-4o-audio-preview model with agents? The integration seems quite challenging because streaming events are not supported... 😥 I think the only approach to be able to stream events is to use TTS instead and convert audio to text. Any ideas?

13 comments

AAriel

Deploying Agent Workflows with Human-in-the-Loop and Function Calling in FastAPI over Websockets

Hi everyone!
Has anyone here succeeded in deploying an agent workflow with human-in-the-loop and function calling in a FastAPI environment over websockets? I have my workflow running just great, as long as the user stays within the main stream_events completing the worflow in one session. If the user decides to leave in in the middle (e.g. navigating away) and websocket gets disconnected, my entire workflow breaks.
I've been trying to implement the save context scenario with no success. I've followed the documentation over and over, still same results: after restoring the context execution hangs at: async for event in handler.stream_events(). I've attached a screenshot of my debugging session.

I've tried to create a minimal version with Google Colab as @Logan M suggested, but I'm unable to reproduce a real asynchronous scenario where a user exits the main loop and reenters at a later stage.

60 comments

AAriel

Checkpoint

I'm currently developing an agent workflow with human in the loop interaction and function calling. The workflow works great if the user stays in the session to complete it. I've tried both context serialization and checkpoints in order to persist context state with no success. I save the context/checkpoint after each iteration and load it back when starting the workflow as suggested in the documentation. I think the problem is with tool calling. Right after loading the checkpoint and adding the new user input, the agent gets stuck "thinking" ... as if it didn't know what steps is next.

5 comments

AAriel

Streaming Text Down To The Client

Hello all! I've managed to build an AgentWorkflow with human in the loop support. The workflow works great, the only thing I'm not able to achieve is streaming text down to the client. I'm using the suggested approach:

Plain Text

async for event in handler.stream_events():
        if isinstance(event, AgentStream):
            message = ChatMessage(role="bot", type="delta", content=event.delta)
            await websocket.send_json(message.model_dump())
...

The message sent to the client always have an empty delta. I'm using OpenAI with streaming set to true, is there anything I'm missing?

10 comments

Find answers from the community

Integrating openai's audio gpt-4o-audio-preview model with agents

Deploying Agent Workflows with Human-in-the-Loop and Function Calling in FastAPI over Websockets

Checkpoint

Streaming Text Down To The Client