Find answers from the community

Updated 3 months ago

Simple directory reader throws error when num_workers set to 1

Hello folks, SimpleDirectoryReader load_data() throws the following error if I set num_workers=1, but does not when num_workers is greater than 1

Plain Text
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.


This is within a ProcessPoolExecutor task kicked off within a fastAPI/starlette background_task

Any ideas? I can just set num_workers=4 for now, but I'd like to understand why this happens

versions:
Plain Text
llama-index-core 0.11.9
llama-index-readers-file 0.2.1
j
L
7 comments
Full stack trace:
Plain Text
Loading files:  25%|██▌       | 6/24 [00:00<00:00, 25.09file/s]ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
... a bunch of starlette middleware callstack ...
    await response(scope, receive, send)
  File "/usr/local/lib/python3.12/site-packages/starlette/responses.py", line 156, in __call__
    await self.background()
  File "/usr/local/lib/python3.12/site-packages/starlette/background.py", line 41, in __call__
    await task()
  File "/usr/local/lib/python3.12/site-packages/starlette/background.py", line 26, in __call__
    await self.func(*self.args, **self.kwargs)
  File "/app/app/tasks/build_index_task.py", line 59, in async_run_index_build_process
    result = await run_in_process(executor, _build_index_task, config.model_dump(), index_id)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/app/tasks/build_index_task.py", line 55, in run_in_process
    return await loop.run_in_executor(executor, fn, *args)  # wait and return result
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I need to remove that option -- it just straightup breaks way too often
I have no idea why tbh
likely one file is breaking and causing the reast of the threads to crash?
If it’s a bad file, shouldn’t it also throw an exception with multiple worker threads? It works when num_workers = 2 just not when 1
No idea tbh -- I read your issue backwards honestly, num_workers=1 should be stable, not sure why it throws that error (it won't even use multiprocessing in that case 😅 )
I suspect I'm still correct though -- This is within a ProcessPoolExecutor task kicked off within a fastAPI/starlette background_task -- Try running it without using that, you might get a more informative error
Add a reply
Sign up and join the conversation on Discord