Does anyone have any findings around switching from gpt 3.5 to gpt 4o mini? I'm finding that structured content is quite a bit worse for gpt 4o mini vs 3.5, often failing to return the values in the correct format. Also the speed is quite a bit slower... But I feel we have to switch as the pricing is so much better...
Forcing the json (either with json mode or function calling) won't help as much either. It can still write bad tool inputs, or hallucinate tool names.
There is a .parse endpoint that only works on very select models that we haven't integrated yet, but my understanding is this is the same as function calling with tool_choice set (which we already do)
hmm okay. Quite dissapointed with OAI in this one. We're also just seeing it follow instructions worse in general. For example we prompt to answer in a certain language and it sometimes ignores this and uses the language of the content we provide. Very annoying...
I've been slowly switching to anthropic for a lot more new apps lately. Its a bit slower (probably similar to 4o-mini), and requires new prompts (you NEED to prompt/parse with XML imo), but it seems very reliable so far