Find answers from the community

Updated 2 months ago

Extracting Information from Websites to Answer Queries

What is the best way to extract information from website and give answer to queries regarding the website and links in them
S
P
3 comments
If you want to use RAG:
  • Use some library like BeautifulSoup to scrape your websites.
  • Filter your data appropriately.
  • Convert it to Markdown or use the HTMLNodeParser directly.
  • Create your Document objects and build your VectorStore on top of it.
  • Enjoy!
(You can do a lot of extras to improve the performance, but this should give you a baseline to work on)
I have used this way. But I am not getting good performance. So, I am asking better way to get information from website and respond as I am building a chat bot to a website that gives information about the company from the company’s website
Just try different stuff, SubQuestionQuerEngine, Reranking, Finetuning - All options to improve the performance. What is always pretty important is the ingestion step, so how good is the data you are scanning? You always need to think about whether you as a human would be able to understand what is going on. A good system prompt can also sometimes do wonders.
Add a reply
Sign up and join the conversation on Discord