Node Parser

At a glance

The community member is asking if there is a built-in method in llamaindex to remove similar nodes or nodes with too few words, such as nodes extracted from a PDF with lots of symbols and few words. The comments suggest that the community members can remove the symbols and small nodes by iterating over the nodes and removing them, and then inserting them into the index. However, there is no built-in method to remove similar nodes, and the community members will have to do it manually by comparing the nodes. One community member mentions that it would be great to have some built-in methods to "maintain" the vector store.

LLeonardo Oliva

Hey guys, is there any method built in llamaindex to remove similard nodes or nodes that have too few words? For example I imagine a node extracted from a PDF that has lots of symbols and few words, I'd like to delete them.

4 comments

LLeonardo Oliva

thread.

WWhiteFang_Jr

You can remove the symbols and small nodes by iterating over the nodes and removing them and then insert them into your index.

Plain Text

# parse nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

for node in nodes:
  # perform the operation of removing here...

for similar, you'll have to do it manually as well by comparing. Not sure if there is any method on this!

LLeonardo Oliva

It'd great to have some built in methods to "mantain" the vector store

LLeonardo Oliva

but yeah, I'll iterate over all my nodes.

Add a reply

Find answers from the community

Node Parser