Find answers from the community

r
ro
·

Timeout

I am trying to figure out how a higher timeout set for the openai llm option isnt clobbered by the retry_decorator here
retry = create_retry_decorator(
max_retries=max_retries,
random_exponential=True,
stop_after_delay_seconds=60,
min_seconds=1,
max_seconds=20,
)

Even though I have set a higher timeout, the retry always kicks in at 60 seconds, shouldn't the retry decorator use the same value for timeout in stop_after_delay_seconds

Maybe I am imissing something?

For additional context I am trying to call a chat completion with the deepseek r1 model, and it always returns from the POST to https://api.deepseek.com/chat/completions in 60 seconds with a 200 OK, but empty response.
6 comments
L
r
Hi, I have a question on using tool calling. I found that when I add tools the AI first makes the decision properly, calling one or another tools according to the prompt. But, after several questions, it stops calling a tool and answers on its own. My guess is at some point, AI starts using the chat memory which has previously asked questions and answers and gets data from there. Am I right? If so, is there any way to prevent looking the answer in the memory? I thought the passing the empty chat memory could be a solution but unfortunately, I can't do it as it may have the specific information from user that my chatbot should be aware of. The forcing to use a tool doesn't work either because it's based solely when the condition described in the prompt, met. Thanks!
5 comments
L
Z
am i being stupid or has the way to change the underlying model changed? why do I get this

ImportError: cannot import name 'OpenAI' from 'llama_index.core.llms' (/home/runner/workspace/.pythonlibs/lib/python3.11/site-packages/llama_index/core/llms/init.py)
7 comments
L
t
Hello, how can i count tokens when using LLM instances in a workflow and not using a query engine as shown in the example in the docs?

Also does the TokenCountingHandler have to be set globally in a callback manager?
2 comments
L
Here's my question: When, using LlamaCloud, I set "capture page screenshot" it does a good job at capturing an image of each parsed page. However, it uses the naming convention 'page_1.jpg' etc and doesn't seem to have an option to customise this as e.g. documentname_page1.jpg - which would save a lot of pain. Also, there is no 'download all images' option that I can see... so I'm having to download hundreds of images individually = more pain. Am I missing something? Or is this a feature request? I'm a paying customer btw 🙂
1 comment
L
How can I pass custom prompt template to the FunctionAgent, where I can prompt engineer the system prompt, tools output ,etc...
:
Plain Text
from llama_index.core.agent.workflow import FunctionAgent, AgentWorkflow
from llama_index.llms.vllm import Vllm

from prompts import (
    ORCHESTRATOR_SYSTEM_PROMPT,
    NOTION_SYSTEM_PROMPT,
    IMPLEMENTATION_SYSTEM_PROMPT,
)
from tools import notion_retrieval_tool, implementation_tool

llm = Vllm(
    model="model_name",
    tensor_parallel_size=4,
    max_new_tokens=100,
    vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
)


orchestrator_agent = FunctionAgent(
    name="OrchestratorAgent",
    description=(
        "You are the OrchestratorAgent responsible for coordinating tasks between multiple agents. "
     
    ),
    system_prompt=ORCHESTRATOR_SYSTEM_PROMPT,
    llm=llm,
    tools=[],
    can_handoff_to=["NotionAgent", "ImplementationAgent"],
)
9 comments
L
L
i wonder how else i can make it call the tool...

llama_index.core.workflow.errors.WorkflowRuntimeError: Error in step 'struct': Expected at least one tool call, but got 0 tool calls.
Plain Text
class UserInfo(BaseModel):
    """
    Output containing the response. Structured information about the person.
    """
    name: str = Field(title="Full Name", description="Full name of the person")

sllm = llm_gemini.as_structured_llm(UserInfo)
        query_engine = index.as_query_engine(doc_ids=doc_ids, use_async=True, llm=sllm)

        answer = await query_engine.aquery(
            """
            Based on the context, generate a structured summary of the person.
            """
        )
3 comments
W
b
L
Hi guys, is there anyone who use open-webui, llama-index, ollama?
I want to receive a description about the uploaded image but the response is strange. like, it describe me about another image. this is the code what i use
thanks.
2 comments
W
k
Hi. I have implemented this Multi-Agent Workflow using Tavily https://docs.llamaindex.ai/en/stable/examples/agent/agent_workflow_multi/

But it is very slow compared to Tavily's own Research Assistant on their website.

This multi-agent workflow takes around a minute to provide a report on the same topic whereas Tavily's Research Assistant does it within seconds.

I am using Azure OpenAI's GPT4o as the LLM in all agents.

Are there any bottlenecks that are present or any strategies I can use to speed up this workflow?
5 comments
L
Hi guys, quick question about modifying the reranking system. I implemented a couple changes into _postprocess_nodes inside of LLMRerank where it gives priority to nodes with a more recent date and a specific subtype. However, since llm_rerank.py is directly part of a Python package, I'm having trouble with utilizing the modified file instead of the original version of LLM rerank. Is there either a way to make these same node reranking adjustments in my workflow file or override the original package with my new code? Here is my updated section of the code:
Plain Text
for node, relevance in zip(choice_nodes, relevances):
                # date score
                date_str = node.metadata.get('date')
                date_score = datetime.strptime(date_str, "%Y-%m-%d").timestamp()
                # Subtype score
                subtype = node.metadata.get('subtype', '')
                subtype_score = 1.0 if subtype == "pro report" else 0.0
                node_score = relevance + date_score + subtype_score
                initial_results.append(NodeWithScore(node=node, score=node_score))
1 comment
M
@kapa.ai reranker = TEIR(top_n=10,model_name='BAAI/bge-reranker-large',base_url="http://10.10.10.50:8020")

2025-02-06 15:03:35.107 | ERROR | core.inference_retriver:query_index:338 - Error in query_index UUID: None token: None - 1 validation error for TextEmbeddingInference
top_n
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='BAAI/bge-reranker-large', input_type=str]
For further information visit https://errors.pydantic.dev/2.9/v/int_parsing
2 comments
k
Hey, y'all! Does LI not support filtering by boolean values in Vector Store metadata? Looking at the MetadataFilter class, it doesn't look like it works rn.
7 comments
L
H
Hello all! I've managed to build an AgentWorkflow with human in the loop support. The workflow works great, the only thing I'm not able to achieve is streaming text down to the client. I'm using the suggested approach:
Plain Text
async for event in handler.stream_events():
        if isinstance(event, AgentStream):
            message = ChatMessage(role="bot", type="delta", content=event.delta)
            await websocket.send_json(message.model_dump())
...

The message sent to the client always have an empty delta. I'm using OpenAI with streaming set to true, is there anything I'm missing?
10 comments
L
A
hi all -

is there some kind of rag which is not based on similarity but actually has a reasoning layer? So the model knows what data it has available to it and reasons which parts it should retrieve.

is that covered here or would have be done via chaining & prompting?
5 comments
W
t
L
@kapa.ai I'm getting this error, even though I can see the Note has a "Text:" element, why? Node: Doc ID: 928b426d-e942-4ca0-a4f6-5edd1be68e16
Text: They are battered and weary, but have successfully purged this
particular corruption from the sacred Brinewood...for now. Does this
accurately summarize the party's current situation and the resolution
of the climactic battle? Let me know if you need any clarification or
have additional details to add.darthus (Darthus) said: 'What's the
current...
Score: 0.967

Ignoring exception in command message:
Traceback (most recent call last):
File "D:\Users\brian\miniconda3\envs\arcanum\Lib\site-packages\discord\commands\core.py", line 138, in wrapped
ret = await coro(arg)
^^^^^^^^^^^^^^^
File "D:\Users\brian\miniconda3\envs\arcanum\Lib\site-packages\discord\commands\core.py", line 1082, in _invoke
await self.callback(ctx, **kwargs)
File "D:\Users\brian\Onedrive\Llamaindex\arcanum-bot\ArcanumBotDev.py", line 909, in message
response = await get_chat_response(arcanum_message, chat_engine, message_content=message)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\brian\Onedrive\Llamaindex\arcanum-bot\ArcanumBotDev.py", line 195, in get_chat_response
Story_Prompt = summarizer.get_related_history(message_content)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Users\brian\Onedrive\Llamaindex\arcanum-bot\story_summarizer.py", line 188, in get_related_history
nodes_text += node.text
^^^^^^^^^
File "D:\Users\brian\miniconda3\envs\arcanum\Lib\site-packages\llama_index\core\schema.py", line 955, in text
raise ValueError("Node must be a TextNode to get text.")
ValueError: Node must be a TextNode to get text.
12 comments
k
D
W
I'm trying to implement FunctionCallingProgram for structured extraction using vLLM, Qwen-2.5, and Pydantic. However, I quickly get this error:
ValueError: Model name Qwen/Qwen2.5-3B does not support function calling API.
8 comments
s
L
Z
ZC
·

Timeout

Hi I am builiding rag with various LLMs, some openai ones some using llama_index.llms.azure_inference, and I just realized that default llamaindex LLM class doesn't come with timeout which is built in OpenAI. I wonder if there are some easy ways to implement the timeout, or I missed something?
1 comment
L
Anyone have any experience with the vertexai multimodal embedding here ? Not sure what the maturity of the implementation is since I don't really see a lot of documentation regarding it. If anyone has any thoughts on how to use it for txt2img retrieval that would be great! Thanks.
5 comments
L
A
how to deploy a reranker on a gpu and call that as a service?
14 comments
L
B
@kapa.ai Hi I'm generating structured predicted output, and the output throws an error: dumps_kwargs keyword arguments are no longer supported.
34 comments
k
y
L
Hey fam! quick question, what's the simplest way to instruct agents to exit out of sub-agent work and go back to orchestration? Curious if there's a simple "Abort workflow" I'm missing from docs. Thanks!
1 comment
b
Undering upserting documents in Qdrant. Suppose I have a google doc (GD) which was already inserted in MongoDB docstore and Qdrant vector store. Now suppose the google doc (GD) was edited and we want to update it in MongoDB docstore and Qdrant vector store. How do we make sure that we do not end up creating duplicate documents.

I heard that we can use doc_id as a field in the metadata for LlamaIndex document and that will help dedupe but how will that work if document is say 1000 pages i.e. the document is broken into multiple nodes. How does doc_id translate to multiple node identifier to identify which node to update in MongoDB docstore and Qdrant vector store?

If we as user should set node_id directly then any guidance into how to generate node_id will be super helpful.
8 comments
L
l
When creating an app for multiple users, obviously I need memory to be individual for each user, but what about retrievers and rerankers, and if I can use them for multiple users at the same time is there latency or not, and if there is latency what is in your experience good balance for sharing retrievers and rerankers
@kapa.ai tips on testing my chunking strategy?
3 comments
k