Find answers from the community

Updated 12 months ago

Node Parser

At a glance

The community member is asking if there is a built-in method in llamaindex to remove similar nodes or nodes with too few words, such as nodes extracted from a PDF with lots of symbols and few words. The comments suggest that the community members can remove the symbols and small nodes by iterating over the nodes and removing them, and then inserting them into the index. However, there is no built-in method to remove similar nodes, and the community members will have to do it manually by comparing the nodes. One community member mentions that it would be great to have some built-in methods to "maintain" the vector store.

Hey guys, is there any method built in llamaindex to remove similard nodes or nodes that have too few words? For example I imagine a node extracted from a PDF that has lots of symbols and few words, I'd like to delete them.
L
W
4 comments
You can remove the symbols and small nodes by iterating over the nodes and removing them and then insert them into your index.

Plain Text
# parse nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

for node in nodes:
  # perform the operation of removing here...


for similar, you'll have to do it manually as well by comparing. Not sure if there is any method on this!
It'd great to have some built in methods to "mantain" the vector store
but yeah, I'll iterate over all my nodes.
Add a reply
Sign up and join the conversation on Discord