Find answers from the community

Updated 2 weeks ago

Efficiently Handling Large CSV Files with LlamaIndex

I'm facing an issue while trying to read a large CSV file (around 20M+ rows) using SimpleDirectoryReader. It seems to struggle with handling such a large file.

Is it possible to read this file using CSVReader? Or are there any other recommended approaches within LlamaIndex for efficiently handling large CSV files?
W
L
2 comments
You can customise the Reader class and use pandas buffer approach to load such a large file.

That way it can help you load such large file. Btw do check if your RAM is capable to load heavy files in the memory
You might have to split the CSV into pieces, 20M rows is going to be a lot. I'm pretty sure the default csv reader is splitting each row into its own document, which you probably also don't want to do (unless its just a list of QA pairs or something textual)
Add a reply
Sign up and join the conversation on Discord