Utilize the `llms.txt` file to improve the web crawler's experience #3864

zwpaper · 2025-02-18T03:27:20Z

Please describe the feature you want

The technology of llms.txt is gradually gaining popularity, and many companies, such as Anthropic, Perplexity, Cloudflare, etc., are providing corresponding support. Using llms.txt makes it easier to access official plain text documents instead of scraping HTML documents, which allows large language models (LLMs) to retrieve critical information.

https://directory.llmstxt.cloud/ is a directory that aggregates a list of llms.txt support. The list contains links to llms.txt files for various document sites, and by following the links, you can access the corresponding documents.

We could consider adding support in the web crawler (it seems the Directory does not have API support), or we could provide our own list. When users select a llms.txt, Tabby can download and index the corresponding txt document.

Additional context
Add any other context or screenshots about the feature request here.

For document support, you can refer to: https://github.com/TabbyML/tabby/blob/main/ee/tabby-webserver/src/service/web_documents.rs
For Web Crawler: https://github.com/TabbyML/tabby/blob/main/ee/tabby-webserver/src/service/background_job/web_crawler.rs

Tabby is already capable of parsing a crawled website into a structured document; we could use it to index the llms.txt file.

Please reply with a 👍 if you want this feature.

zwpaper added the enhancement New feature or request label Feb 18, 2025

zwpaper assigned Sma1lboy Feb 18, 2025

Sma1lboy mentioned this issue Feb 19, 2025

feat(crawler): add functionality to fetch and index LLMS files from WebDocuments #3880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize the `llms.txt` file to improve the web crawler's experience #3864

Utilize the `llms.txt` file to improve the web crawler's experience #3864

zwpaper commented Feb 18, 2025

Utilize the llms.txt file to improve the web crawler's experience #3864

Utilize the llms.txt file to improve the web crawler's experience #3864

Comments

zwpaper commented Feb 18, 2025

Utilize the `llms.txt` file to improve the web crawler's experience #3864

Utilize the `llms.txt` file to improve the web crawler's experience #3864