Added a plugin to make the documentation LLM friendly #413

BboyAkers · 2026-01-13T06:44:47Z

No description provided.

github-actions · 2026-01-13T06:49:29Z

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

Ethan-Arrowood · 2026-01-13T11:46:36Z

package.json

 		"@docusaurus/theme-search-algolia": "3.9.1",
 		"@easyops-cn/docusaurus-search-local": "0.52.1",
 		"@mdx-js/react": "3.1.1",
+		"@signalwire/docusaurus-plugin-llms-txt": "^1.2.2",


Lets pin this version

Do we have a process of ensuring this stays up-to-date? Does that process provide some safety when combined with pinning? I'd think that we want to keep this pretty up-to-date, and like always, defaulting to pinning seems like a risky choice.

I think this is the same as our other repos that are relying on socket / renovate to update dependencies automatically. I just noticed many of the other dependencies here are pinned so wanted to stick to that pattern.

I hadn't noticed any socket or renovate PRs and https://github.com/HarperFast/documentation/commits/main/package.json looks pretty quiet, which makes me question this.

Good point - we don't have those tools set up here yet, but we have pinned dependency versions. Let's get the tools enabled asap

Ethan-Arrowood · 2026-01-13T11:48:54Z

docusaurus.config.ts

+        siteTitle: 'Harper Documentation',
+        siteDescription: 'Comprehensive guide to developing on and using the Harper platform',
+        depth: 2,
+        content: {
+          includeBlog: true,
+          includePages: true,
+          enableLlmsFullTxt: true  // Optional: generates llms-full.txt
+        }


Can you include some more comments or add a section to a markdown doc on what this configuration all means? Like what is the depth 2? Why not 3? And what about include Blog? We don't really have a blog here

Along those lines, how did the organization vary and improve at 2 vs 3? Claude is recommending 3 for docs (for better organization - in its 'opinion' 😄 )

oh, nvm, I see the comment now

here's claude's analysis after changing the depth to 3 and setting includeVersionedDocs: false:

⏺ Perfect! The changes are working beautifully. Here's the comparison: 📊 Before vs After Comparison File Size Reduction ┌────────────────────────────┬─────────┬────────┬──────────────────────────┐ │ Metric │ Before │ After │ Change │ ├────────────────────────────┼─────────┼────────┼──────────────────────────┤ │ llms.txt │ 220KB │ 67KB │ ✅ 69% smaller │ ├────────────────────────────┼─────────┼────────┼──────────────────────────┤ │ llms-full.txt │ 5.4MB │ 1.2MB │ ✅ 78% smaller │ ├────────────────────────────┼─────────┼────────┼──────────────────────────┤ │ Line count (llms.txt) │ 1,083 │ 1,528 │ More organized structure │ ├────────────────────────────┼─────────┼────────┼──────────────────────────┤ │ Line count (llms-full.txt) │ 120,078 │ 27,732 │ ✅ 77% reduction │ └────────────────────────────┴─────────┴────────┴──────────────────────────┘ Content Filtering ┌─────────────────────────────┬────────┬───────┬──────────────────────────┐ │ Metric │ Before │ After │ Result │ ├─────────────────────────────┼────────┼───────┼──────────────────────────┤ │ Documents processed │ ~730+ │ 317 │ ✅ Only current version │ ├─────────────────────────────┼────────┼───────┼──────────────────────────┤ │ Old version links (4.1-4.6) │ 610 │ 0 │ ✅ Completely removed │ ├─────────────────────────────┼────────┼───────┼──────────────────────────┤ │ Current version links │ 114 │ 114 │ ✅ Preserved │ ├─────────────────────────────┼────────┼───────┼──────────────────────────┤ │ Routes excluded │ 0 │ 616 │ ✅ Old versions filtered │ └─────────────────────────────┴────────┴───────┴──────────────────────────┘ Hierarchy Structure with depth: 3 The depth: 3 setting creates a much better organized structure: ## docs ← Level 2: Main section ### administration ← Level 3: Category #### harper-studio ← Level 4: Subcategory - [Create Account](...) - [Login](...) - [Manage Databases](...) #### logging ← Level 4: Subcategory - [Standard Logging](...) - [Audit Logging](...) Hierarchy breakdown: - 5 Level 2 sections (## - top level: docs, fabric, learn, etc.) - 21 Level 3 sections (### - categories: administration, developers, deployments) - 219 Level 4 sections (#### - subcategories: harper-studio, logging, cloning) Why depth: 3 is Perfect With your URL structure like /docs/administration/harper-studio/create-account: - depth: 1 → Would group by /docs only (too broad) - depth: 2 → Would group by /docs/administration (okay, but loses subcategory structure) - depth: 3 ✅ → Groups by /docs/administration/harper-studio (ideal - preserves logical structure) Benefits for LLMs 1. Cleaner context - Only current (4.7) documentation, no confusion from old versions 2. Smaller files - 78% reduction means faster processing 3. Better organization - Deep hierarchy helps LLMs understand document relationships 4. Focused information - 317 current docs vs 730+ mixed versions Recommendation These changes look excellent! The output is: - ✅ Much smaller and faster to process - ✅ Only includes current documentation - ✅ Well-organized with proper hierarchy - ✅ No old version confusion This is exactly what you want for LLM-friendly documentation. The PR should include these configuration settings in the final version.

kriszyp · 2026-01-13T16:22:26Z

At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI?
Anyway, this does seem like a simple path, but would love more understanding how it is better.

github-actions · 2026-01-13T17:07:56Z

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

BboyAkers · 2026-01-13T17:15:05Z

At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI? Anyway, this does seem like a simple path, but would love more understanding how it is better.

I'm assuming markdown is preferred due to the lower token consumption vs html. That'd be my only guess 🤷🏾

kriszyp · 2026-01-13T18:29:40Z

lower token consumption

There is token consumption in GEO? Whose tokens are being consumed?

Ethan-Arrowood · 2026-01-13T20:56:18Z

I don't think it has much to do with token consumption but rather the style of the input. These LLMs are only really good at words; they aren't servers. They can't necessarily parse HTML as easily as we assume. So since Markdown is much more readable that is what the LLMs prefer.

Ethan-Arrowood · 2026-01-13T20:57:10Z

I also agree though; why can't we just expose the markdown source for our pages rather than something else? But I don't really know what this is all about anyways so maybe its just a latest wave of how we can optimize our site for AI robots.

kriszyp · 2026-01-13T22:06:35Z

https://www.longato.ch/llm-md-files/

Ethan-Arrowood · 2026-01-14T20:37:24Z

lol so whats with the craze to add markdown files for LLMs if this research says its pointless?

Just social media noise? TBH if I didn't see this PR and actively search for it on linkedin I wouldn't have known this was a recent change for doc sites.

heskew · 2026-01-14T22:11:40Z

It's still just a proposed standard. Could use anthropic as an example to follow: https://platform.claude.com/llms.txt

I asked Claude how it would discover Harper docs and it said it would only probe for an llms.txt if explicitly asked. E.g. "here's the llms.txt for harper, help me build something" (could be instructions in a Claude skill for example) or have it configured in an IDE somehow.

BboyAkers · 2026-01-15T00:19:28Z

lol so whats with the craze to add markdown files for LLMs if this research says its pointless?

Just social media noise? TBH if I didn't see this PR and actively search for it on linkedin I wouldn't have known this was a recent change for doc sites.

It helps make our docs more discoverable by llms and usable for folks who have a more integrated AI workflow

kriszyp · 2026-01-15T00:45:43Z

I asked Claude how it would discover Harper docs and it said it would only probe for an llms.txt if explicitly asked.

These are publicly accessible docs, so LLMs should have already ingested this information right? Giving LLMs directives about where to look, would be for documentation that LLMs didn't previously know about?

It helps make our docs more discoverable by llms

You are pretty confident that Flavio is wrong? Or there are other LLM pathways?

lol so whats with the craze to add markdown files for LLMs

A myth like this could gain traction in our industry? Inconceivable :)
Sarcasm aside, curious what the LLM pathways are that we are trying to address.

BboyAkers · 2026-01-20T17:23:29Z

I'll revisit this PR today ya'll. Sorry for the dealy!

socket-security · 2026-01-21T18:48:31Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	@signalwire/docusaurus-plugin-llms-txt@1.2.2

View full report

github-actions · 2026-01-21T19:40:27Z

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

dawsontoth · 2026-01-21T19:44:08Z

package.json

 		"lint": "echo 0;",
 		"preview:pr": "node scripts/preview-pr.mjs"
 	},
+	"dependencies": {


If we go forward with this I don't think these are dependencies. I think they're all dev dependencies.

BboyAkers · 2026-01-21T20:02:53Z

These are publicly accessible docs, so LLMs should have already ingested this information right? Giving LLMs directives about where to look, would be for documentation that LLMs didn't previously know about?

I asked Claude how it would discover Harper docs and it said it would only probe for an llms.txt if explicitly asked.

These are publicly accessible docs, so LLMs should have already ingested this information right? Giving LLMs directives about where to look, would be for documentation that LLMs didn't previously know about?

It helps make our docs more discoverable by llms

You are pretty confident that Flavio is wrong? Or there are other LLM pathways?

I'll be transparent and state I'm not confident Flavio is wrong. My assertion comes from diving into Mintlify, Vercel, and Appwrite with their use/implementation of llms.txt files. Along with my bias of being able to ask Claude questions about their products and accurate answers with links to pieces of docs I was looking for.

lol so whats with the craze to add markdown files for LLMs

A myth like this could gain traction in our industry? Inconceivable :) Sarcasm aside, curious what the LLM pathways are that we are trying to address.

I can do more research and get back to you. I'd love to know if @Ethan-Arrowood @cb1kenobi know anyone at Vercel that could speak more to this? I can reach out to the head of devrel at Appwrite to if they can share the metrics related to llms.txt files

Ethan-Arrowood · 2026-01-21T22:55:00Z

I don't think I know anyone directly for you to speak with. If you identified someone in particular from Vercel, I could see if I can make an introduction (assuming I knew them).

Otherwise, I asked Claude (ironic I know) to help facilitate some additional research. Here is it's response:

I personally reviewed a few of the sources mentioned; but not all of them.

've found a good collection of sources on the llm.txt debate. Here's what I found:

Against llm.txt effectiveness:

SE Ranking's 300k domain analysis - https://seranking.com/blog/llms-txt/
- Major study showing no effect of llms.txt on domain citations by LLMs, with removal of the variable actually improving their model's accuracy
Daydream's critical analysis - https://www.withdaydream.com/library/llms-txt-hype-vs-reality
- Warns about data divergence risks and lack of major platform support
Search Engine Journal coverage - https://www.searchenginejournal.com/llms-txt-shows-no-clear-effect-on-ai-citations-based-on-300k-domains/561542/
- Reports on the SE Ranking findings
Symphonic Digital - https://www.symphonicdigital.com/blog/llm-txt-why-this-geo-file-isnt-worth-your-dev-resources
- Argues it's a proposal without broad acceptance, comparing it to past SEO fads like AMP
Primary Position's "myth" analysis - https://primaryposition.com/blog/ai-llms-txt/
- Debunks common misconceptions about the standard

For/Neutral on llm.txt:

Jeremy Howard's original proposal - https://www.answer.ai/posts/2024-09-03-llmstxt.html and https://llmstxt.org/
- The foundational proposal arguing for LLM-friendly content delivery
Mintlify's implementation post - https://www.mintlify.com/blog/simplifying-docs-with-llms-txt
- From a platform that auto-generates llms.txt for all hosted docs
ScaleMath's analysis - https://scalemath.com/blog/llms-txt/
- More balanced take on adoption and use cases
Search Engine Land - https://searchengineland.com/llms-txt-proposed-standard-453676
- SEO perspective suggesting it's worth implementing despite uncertainty
Peec AI's practical guide - https://peec.ai/blog/llms-txt-md-files-important-ai-visibility-helper-or-hoax
- Includes testing methodologies

Key takeaway: The consensus from research (particularly the SE Ranking 300k domain study) shows no measurable impact on AI citations currently, though some argue it's worth implementing as low-risk future-proofing. Major platforms haven't confirmed support, and log analysis shows LLM crawlers aren't fetching these files.

Since there is no overly compelling research pointing to its effectiveness, my additional acceptance criteria are:

what is the impact on build time
what is the impact on deploy time (i.e. size of site)

As long as this is not going to be a major thing to maintain, I'm fine with including it on the grounds of "low-risk future-proofing".

Additionally, if we are to proceed with this, can we update it so it only uses this plugin for production deployments?

kriszyp · 2026-01-21T23:14:05Z

low-risk future-proofing

The SE Ranking conclusion suggests (and provides evidence) that there is actually some risk (of decreased accuracy) though, right?

I think this PR is actually conflating two different suggestions:

Adding a /llms.txt endpoint that returns some info for LLMs
Adding a plugin that outputs markdown content generated from HTML (which was generated from the markdown)... I think?

IMHO, #1, just adding a llms.txt at the root, without any new plugins, does seem relatively low-risk and I'm fine with that. On the other hand, the impact of #2 doesn't seem well defined or understood in this PR, and the risks in additional dependencies, build overhead, and supply-chain risks without any clear evidence or understanding of what we are doing seems more ill-advised to me.

Ethan-Arrowood · 2026-01-21T23:18:07Z

That is a great point. I'm +1 to that plan.

BboyAkers · 2026-01-22T16:49:00Z

I guess the suggested direction is me possibly creating a script to append all .md files into our own llms.txt file without adding a plugin? @kriszyp @Ethan-Arrowood

kriszyp · 2026-01-22T16:53:53Z

creating a script to append all .md files into our own llms.txt file without adding a plugin?

Based on my reading, I would consider that to be a poor quality llms.txt since it doesn't provide any information that the docs don't already provide. The guidance link that Ethan provided points to https://www.fastht.ml/docs/llms.txt as a good example. I would think this is also potentially close to what we want: https://github.com/HarperFast/application-template/blob/77c9b7845a2e3fc1057f8c53c61cab4621f1b23e/AGENTS.md
This actually provides real LLM-specific guidance, not just the info that is already there.

BboyAkers · 2026-01-22T17:54:58Z

Closing and creating a different PR related to the suggestions! Thanks for the feedback ya'll!

github-actions · 2026-01-22T17:55:46Z

🧹 Preview Cleanup

The preview deployment for this PR has been removed.

BboyAkers changed the title ~~dded a pluging to make the documenation llm friendly~~ Added a plugin to make the documentation LLM friendly Jan 13, 2026

github-actions bot temporarily deployed to pr-413 January 13, 2026 06:49 Inactive

Ethan-Arrowood reviewed Jan 13, 2026

View reviewed changes

github-actions bot temporarily deployed to pr-413 January 13, 2026 17:07 Inactive

BboyAkers added 2 commits January 21, 2026 13:45

added a pluging to make the documenation llm friendly

063da9f

updating plugin config and added pseudo code

90b3159

BboyAkers force-pushed the llm-friendly-docs branch from 875e453 to 90b3159 Compare January 21, 2026 18:45

BboyAkers requested a review from a team as a code owner January 21, 2026 18:45

BboyAkers added 3 commits January 21, 2026 14:32

pinned package version and removed versioned docs in build

3399f49

fixed package-lock error

05d1aab

formatted docusarus config

d4a864d

github-actions bot temporarily deployed to pr-413 January 21, 2026 19:40 Inactive

dawsontoth reviewed Jan 21, 2026

View reviewed changes

BboyAkers closed this Jan 22, 2026

Added a plugin to make the documentation LLM friendly #413

Added a plugin to make the documentation LLM friendly #413

Uh oh!

Conversation

BboyAkers commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

🚀 Preview Deployment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kriszyp commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026

🚀 Preview Deployment

Uh oh!

BboyAkers commented Jan 13, 2026

Uh oh!

kriszyp commented Jan 13, 2026

Uh oh!

Ethan-Arrowood commented Jan 13, 2026

Uh oh!

Ethan-Arrowood commented Jan 13, 2026

Uh oh!

kriszyp commented Jan 13, 2026

Uh oh!

Ethan-Arrowood commented Jan 14, 2026

Uh oh!

heskew commented Jan 14, 2026

Uh oh!

BboyAkers commented Jan 15, 2026

Uh oh!

kriszyp commented Jan 15, 2026

Uh oh!

BboyAkers commented Jan 20, 2026

Uh oh!

socket-security bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 21, 2026

🚀 Preview Deployment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BboyAkers commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ethan-Arrowood commented Jan 21, 2026

Against llm.txt effectiveness:

For/Neutral on llm.txt:

Uh oh!

kriszyp commented Jan 21, 2026

Uh oh!

Ethan-Arrowood commented Jan 21, 2026

Uh oh!

BboyAkers commented Jan 22, 2026

Uh oh!

kriszyp commented Jan 22, 2026

Uh oh!

BboyAkers commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 22, 2026

kriszyp commented Jan 13, 2026 •

edited

Loading

socket-security bot commented Jan 21, 2026 •

edited

Loading

BboyAkers commented Jan 21, 2026 •

edited

Loading

BboyAkers commented Jan 22, 2026 •

edited

Loading