Find answers from the community

Updated 2 weeks ago

Validation Error With Multimodal Azure Openai Package

We are randomly getting this error with the multimodal azure openai package:

Plain Text
vision_service-1        | pydantic_core._pydantic_core.ValidationError: 2 validation errors for ChatMessage
vision_service-1        | blocks.0
vision_service-1        |   Unable to extract tag using discriminator ‘block_type’ [type=union_tag_not_found, input_value={‘type’: ‘text’, ‘text’: ‘Describe what you see’}, input_type=dict]
vision_service-1        |     For further information visit https://errors.pydantic.dev/2.9/v/union_tag_not_found
vision_service-1        | blocks.1
vision_service-1        |   Unable to extract tag using discriminator ‘block_type’ [type=union_tag_not_found, input_value={‘type’: ‘image_url’, ‘im...gg==’, ‘detail’: ‘low’}}, input_type=dict]

Anyone has an idea what is up? Nothing changed in our code.
L
N
C
17 comments
Do you have a traceback?
Curious what top-level code reproduced this
(also, whichever version of llama-index-core and llama-index-multi-modal-openai you have would help too)
llama-index==0.12.4
llama-index-multi-modal-llms-azure-openai==0.3.0
Plain Text
"""Module for multimodal LLM calls."""

import json
import os

from llama_index.core import Document
from llama_index.core.program.multi_modal_llm_program import MultiModalLLMCompletionProgram
from llama_index.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
from pydantic import BaseModel, ValidationError

MODEL = "gpt-4o-2024-05-13"
ENGINE = os.environ.get("AZURE_OPENAI_API_DEPLOYMENT_NAME_PREMIUM")
AZURE_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
AZURE_ENDPOINT = os.environ.get("AZURE_OPENAI_API_ENDPOINT")
API_VERSION = os.environ.get("AZURE_OPENAI_API_VERSION")


multimodal_llm = AzureOpenAIMultiModal(
    model=MODEL,
    engine=ENGINE,
    api_key=AZURE_API_KEY,
    azure_endpoint=AZURE_ENDPOINT,
    api_version=API_VERSION,
    max_new_tokens=2000,
)


MAX_LLM_CALL_ATTEMPTS = 3


def multimodal_call[T: BaseModel](output_cls: type[T], prompt: str, image_documents: list[Document]) -> T | None:
    """Call the multimodal LLM with the given prompt and image documents."""
    print(multimodal_llm.complete("Describe this image.", image_documents))

    attempts = 0
    while attempts < MAX_LLM_CALL_ATTEMPTS:
        try:
            return MultiModalLLMCompletionProgram.from_defaults(
                output_cls=output_cls,
                prompt_template_str=prompt,
                multi_modal_llm=multimodal_llm,
                image_documents=image_documents,
            )()
        except (json.JSONDecodeError, ValidationError, ValueError): # noqa: PERF203
            attempts += 1
    return None
multimodal_llm.complete(...) already breaks it
thanks for the code, taking a look now
Do you mind trying to bump a few versions for me, just to be sure? I was unable to reproduce with the latest

pip install -U llama-index-llms-openai llama-index-multi-modal-llms-openai llama-index-multi-modal-llms-azure-openai

There was a few refactors to the base chat message class and how multi modal works, probably need to rejig some of the min-version deps here.
acutally my bad, my local code was a bit behind. llama-index-llms-openai==0.3.2 works, llama-index-llms-openai==0.3.3 breaks
Maybe because we only have these two deps?

Plain Text
llama-index==0.12.4
llama-index-multi-modal-llms-azure-openai==0.3.0


Maybe it resolves some weird version of core?
Not sure about that. But I do know for me locally installing those deps + llama-index-llms-openai==0.3.2 will work, but llama-index-llms-openai==0.3.3 will break

Working on a patch to straighten this out in any case 👍
Thanks a lot!
Should be all fixed with the latest versions of things now 👍
Add a reply
Sign up and join the conversation on Discord