gpt-4o-audio-preview
model with agents? The integration seems quite challenging because streaming events are not supported... π₯ I think the only approach to be able to stream events is to use TTS instead and convert audio to text. Any ideas?@step async def handle_audio(self, ctx: Context, ev: AudioInput) -> ProcessedInput: # Process audio input result = self.process_audio(ev.audio_path) return ProcessedInput(result=result) ... # use this function to process audio def process_audio(self, audio_path: str) -> str: """Process audio input and return a description.""" messages = [ ChatMessage( role="user", blocks=[ AudioBlock(path=audio_path, format="wav"), TextBlock(text="Describe the content of this audio."), ], ) ] llm=self.agents[self.root_agent].llm response = llm.chat(messages) return str(response)
handler.stream_events()
as we've seen before.llm.astream_chat()
on the chat messages -- so if you have audio messages, this will probably cause an issue if OpenAI doesn't support streaming here await self.llm.astream_chat_with_tools
is being called. It's an AgentOutput step actually... hmmm I haven't overriden that one. Maybe I need to override that too.run_agent_step
in the AgentWorkflow
-- which is just calling agent.take_step()
-- so I think you'd want to override that last methodrun_agent_step
such that it never calls take_step