Skip to content

Conversation

@BboyAkers
Copy link
Member

No description provided.

@BboyAkers BboyAkers changed the title dded a pluging to make the documenation llm friendly Added a plugin to make the documentation LLM friendly Jan 13, 2026
@github-actions
Copy link

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

@github-actions github-actions bot temporarily deployed to pr-413 January 13, 2026 06:49 Inactive
package.json Outdated
"@docusaurus/theme-search-algolia": "3.9.1",
"@easyops-cn/docusaurus-search-local": "0.52.1",
"@mdx-js/react": "3.1.1",
"@signalwire/docusaurus-plugin-llms-txt": "^1.2.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets pin this version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a process of ensuring this stays up-to-date? Does that process provide some safety when combined with pinning? I'd think that we want to keep this pretty up-to-date, and like always, defaulting to pinning seems like a risky choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the same as our other repos that are relying on socket / renovate to update dependencies automatically. I just noticed many of the other dependencies here are pinned so wanted to stick to that pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't noticed any socket or renovate PRs and https://github.com/HarperFast/documentation/commits/main/package.json looks pretty quiet, which makes me question this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - we don't have those tools set up here yet, but we have pinned dependency versions. Let's get the tools enabled asap

Comment on lines 272 to 284
siteTitle: 'Harper Documentation',
siteDescription: 'Comprehensive guide to developing on and using the Harper platform',
depth: 2,
content: {
includeBlog: true,
includePages: true,
enableLlmsFullTxt: true // Optional: generates llms-full.txt
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include some more comments or add a section to a markdown doc on what this configuration all means? Like what is the depth 2? Why not 3? And what about include Blog? We don't really have a blog here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along those lines, how did the organization vary and improve at 2 vs 3? Claude is recommending 3 for docs (for better organization - in its 'opinion' 😄 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, nvm, I see the comment now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's claude's analysis after changing the depth to 3 and setting includeVersionedDocs: false:

⏺ Perfect! The changes are working beautifully. Here's the comparison:

  📊 Before vs After Comparison

  File Size Reduction
  ┌────────────────────────────┬─────────┬────────┬──────────────────────────┐
  │           Metric           │ Before  │ After  │          Change          │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ llms.txt                   │ 220KB   │ 67KB   │ ✅ 69% smaller           │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ llms-full.txt              │ 5.4MB   │ 1.2MB  │ ✅ 78% smaller           │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ Line count (llms.txt)      │ 1,083   │ 1,528  │ More organized structure │
  ├────────────────────────────┼─────────┼────────┼──────────────────────────┤
  │ Line count (llms-full.txt) │ 120,078 │ 27,732 │ ✅ 77% reduction         │
  └────────────────────────────┴─────────┴────────┴──────────────────────────┘
  Content Filtering
  ┌─────────────────────────────┬────────┬───────┬──────────────────────────┐
  │           Metric            │ Before │ After │          Result          │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Documents processed         │ ~730+  │ 317   │ ✅ Only current version  │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Old version links (4.1-4.6) │ 610    │ 0     │ ✅ Completely removed    │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Current version links       │ 114    │ 114   │ ✅ Preserved             │
  ├─────────────────────────────┼────────┼───────┼──────────────────────────┤
  │ Routes excluded             │ 0      │ 616   │ ✅ Old versions filtered │
  └─────────────────────────────┴────────┴───────┴──────────────────────────┘
  Hierarchy Structure with depth: 3

  The depth: 3 setting creates a much better organized structure:

  ## docs                          ← Level 2: Main section
    ### administration             ← Level 3: Category
      #### harper-studio           ← Level 4: Subcategory
        - [Create Account](...)
        - [Login](...)
        - [Manage Databases](...)
      #### logging                 ← Level 4: Subcategory
        - [Standard Logging](...)
        - [Audit Logging](...)

  Hierarchy breakdown:
  - 5 Level 2 sections (## - top level: docs, fabric, learn, etc.)
  - 21 Level 3 sections (### - categories: administration, developers, deployments)
  - 219 Level 4 sections (#### - subcategories: harper-studio, logging, cloning)

  Why depth: 3 is Perfect

  With your URL structure like /docs/administration/harper-studio/create-account:
  - depth: 1 → Would group by /docs only (too broad)
  - depth: 2 → Would group by /docs/administration (okay, but loses subcategory structure)
  - depth: 3 ✅ → Groups by /docs/administration/harper-studio (ideal - preserves logical structure)

  Benefits for LLMs

  1. Cleaner context - Only current (4.7) documentation, no confusion from old versions
  2. Smaller files - 78% reduction means faster processing
  3. Better organization - Deep hierarchy helps LLMs understand document relationships
  4. Focused information - 317 current docs vs 730+ mixed versions

  Recommendation

  These changes look excellent! The output is:
  - ✅ Much smaller and faster to process
  - ✅ Only includes current documentation
  - ✅ Well-organized with proper hierarchy
  - ✅ No old version confusion

  This is exactly what you want for LLM-friendly documentation. The PR should include these configuration settings in the final version.

@kriszyp
Copy link
Member

kriszyp commented Jan 13, 2026

At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI?
Anyway, this does seem like a simple path, but would love more understanding how it is better.

@github-actions github-actions bot temporarily deployed to pr-413 January 13, 2026 17:07 Inactive
@github-actions
Copy link

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

@BboyAkers
Copy link
Member Author

At a high level, the whole concept of creating documentation in markdown, converting to HTML, and the converting to markdown is... 🤔 . Can we simply make the original markdown publicly available? How much longer do we expect AI to actually prefer markdown to HTML? Or is that advice over a month old and no longer relevant in modern AI? Anyway, this does seem like a simple path, but would love more understanding how it is better.

I'm assuming markdown is preferred due to the lower token consumption vs html. That'd be my only guess 🤷🏾

@kriszyp
Copy link
Member

kriszyp commented Jan 13, 2026

lower token consumption

There is token consumption in GEO? Whose tokens are being consumed?

@Ethan-Arrowood
Copy link
Member

I don't think it has much to do with token consumption but rather the style of the input. These LLMs are only really good at words; they aren't servers. They can't necessarily parse HTML as easily as we assume. So since Markdown is much more readable that is what the LLMs prefer.

@Ethan-Arrowood
Copy link
Member

I also agree though; why can't we just expose the markdown source for our pages rather than something else? But I don't really know what this is all about anyways so maybe its just a latest wave of how we can optimize our site for AI robots.

@kriszyp
Copy link
Member

kriszyp commented Jan 13, 2026

@Ethan-Arrowood
Copy link
Member

lol so whats with the craze to add markdown files for LLMs if this research says its pointless?

Just social media noise? TBH if I didn't see this PR and actively search for it on linkedin I wouldn't have known this was a recent change for doc sites.

@heskew
Copy link
Member

heskew commented Jan 14, 2026

It's still just a proposed standard. Could use anthropic as an example to follow: https://platform.claude.com/llms.txt

I asked Claude how it would discover Harper docs and it said it would only probe for an llms.txt if explicitly asked. E.g. "here's the llms.txt for harper, help me build something" (could be instructions in a Claude skill for example) or have it configured in an IDE somehow.

@BboyAkers
Copy link
Member Author

lol so whats with the craze to add markdown files for LLMs if this research says its pointless?

Just social media noise? TBH if I didn't see this PR and actively search for it on linkedin I wouldn't have known this was a recent change for doc sites.

It helps make our docs more discoverable by llms and usable for folks who have a more integrated AI workflow

@kriszyp
Copy link
Member

kriszyp commented Jan 15, 2026

I asked Claude how it would discover Harper docs and it said it would only probe for an llms.txt if explicitly asked.

These are publicly accessible docs, so LLMs should have already ingested this information right? Giving LLMs directives about where to look, would be for documentation that LLMs didn't previously know about?

It helps make our docs more discoverable by llms

You are pretty confident that Flavio is wrong? Or there are other LLM pathways?

lol so whats with the craze to add markdown files for LLMs

A myth like this could gain traction in our industry? Inconceivable :)
Sarcasm aside, curious what the LLM pathways are that we are trying to address.

@BboyAkers
Copy link
Member Author

I'll revisit this PR today ya'll. Sorry for the dealy!

@BboyAkers BboyAkers requested a review from a team as a code owner January 21, 2026 18:45
@socket-security
Copy link

socket-security bot commented Jan 21, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Added@​signalwire/​docusaurus-plugin-llms-txt@​1.2.28710010090100

View full report

@github-actions github-actions bot temporarily deployed to pr-413 January 21, 2026 19:40 Inactive
@github-actions
Copy link

🚀 Preview Deployment

Your preview deployment is ready!

🔗 Preview URL: https://preview.harper-docs.stage.harperfabric.com/pr-413

This preview will update automatically when you push new commits.

"lint": "echo 0;",
"preview:pr": "node scripts/preview-pr.mjs"
},
"dependencies": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go forward with this I don't think these are dependencies. I think they're all dev dependencies.

@BboyAkers
Copy link
Member Author

BboyAkers commented Jan 21, 2026

These are publicly accessible docs, so LLMs should have already ingested this information right? Giving LLMs directives about where to look, would be for documentation that LLMs didn't previously know about?

I asked Claude how it would discover Harper docs and it said it would only probe for an llms.txt if explicitly asked.

These are publicly accessible docs, so LLMs should have already ingested this information right? Giving LLMs directives about where to look, would be for documentation that LLMs didn't previously know about?

It helps make our docs more discoverable by llms

You are pretty confident that Flavio is wrong? Or there are other LLM pathways?

I'll be transparent and state I'm not confident Flavio is wrong. My assertion comes from diving into Mintlify, Vercel, and Appwrite with their use/implementation of llms.txt files. Along with my bias of being able to ask Claude questions about their products and accurate answers with links to pieces of docs I was looking for.

lol so whats with the craze to add markdown files for LLMs

A myth like this could gain traction in our industry? Inconceivable :) Sarcasm aside, curious what the LLM pathways are that we are trying to address.

I can do more research and get back to you. I'd love to know if @Ethan-Arrowood @cb1kenobi know anyone at Vercel that could speak more to this? I can reach out to the head of devrel at Appwrite to if they can share the metrics related to llms.txt files

@Ethan-Arrowood
Copy link
Member

I don't think I know anyone directly for you to speak with. If you identified someone in particular from Vercel, I could see if I can make an introduction (assuming I knew them).

Otherwise, I asked Claude (ironic I know) to help facilitate some additional research. Here is it's response:

I personally reviewed a few of the sources mentioned; but not all of them.


've found a good collection of sources on the llm.txt debate. Here's what I found:

Against llm.txt effectiveness:

  1. SE Ranking's 300k domain analysis - https://seranking.com/blog/llms-txt/

    • Major study showing no effect of llms.txt on domain citations by LLMs, with removal of the variable actually improving their model's accuracy
  2. Daydream's critical analysis - https://www.withdaydream.com/library/llms-txt-hype-vs-reality

    • Warns about data divergence risks and lack of major platform support
  3. Search Engine Journal coverage - https://www.searchenginejournal.com/llms-txt-shows-no-clear-effect-on-ai-citations-based-on-300k-domains/561542/

    • Reports on the SE Ranking findings
  4. Symphonic Digital - https://www.symphonicdigital.com/blog/llm-txt-why-this-geo-file-isnt-worth-your-dev-resources

    • Argues it's a proposal without broad acceptance, comparing it to past SEO fads like AMP
  5. Primary Position's "myth" analysis - https://primaryposition.com/blog/ai-llms-txt/

    • Debunks common misconceptions about the standard

For/Neutral on llm.txt:

  1. Jeremy Howard's original proposal - https://www.answer.ai/posts/2024-09-03-llmstxt.html and https://llmstxt.org/

    • The foundational proposal arguing for LLM-friendly content delivery
  2. Mintlify's implementation post - https://www.mintlify.com/blog/simplifying-docs-with-llms-txt

    • From a platform that auto-generates llms.txt for all hosted docs
  3. ScaleMath's analysis - https://scalemath.com/blog/llms-txt/

    • More balanced take on adoption and use cases
  4. Search Engine Land - https://searchengineland.com/llms-txt-proposed-standard-453676

    • SEO perspective suggesting it's worth implementing despite uncertainty
  5. Peec AI's practical guide - https://peec.ai/blog/llms-txt-md-files-important-ai-visibility-helper-or-hoax

    • Includes testing methodologies

Key takeaway: The consensus from research (particularly the SE Ranking 300k domain study) shows no measurable impact on AI citations currently, though some argue it's worth implementing as low-risk future-proofing. Major platforms haven't confirmed support, and log analysis shows LLM crawlers aren't fetching these files.


Since there is no overly compelling research pointing to its effectiveness, my additional acceptance criteria are:

  • what is the impact on build time
  • what is the impact on deploy time (i.e. size of site)

As long as this is not going to be a major thing to maintain, I'm fine with including it on the grounds of "low-risk future-proofing".

Additionally, if we are to proceed with this, can we update it so it only uses this plugin for production deployments?

@kriszyp
Copy link
Member

kriszyp commented Jan 21, 2026

low-risk future-proofing

The SE Ranking conclusion suggests (and provides evidence) that there is actually some risk (of decreased accuracy) though, right?

I think this PR is actually conflating two different suggestions:

  1. Adding a /llms.txt endpoint that returns some info for LLMs
  2. Adding a plugin that outputs markdown content generated from HTML (which was generated from the markdown)... I think?

IMHO, #1, just adding a llms.txt at the root, without any new plugins, does seem relatively low-risk and I'm fine with that. On the other hand, the impact of #2 doesn't seem well defined or understood in this PR, and the risks in additional dependencies, build overhead, and supply-chain risks without any clear evidence or understanding of what we are doing seems more ill-advised to me.

@Ethan-Arrowood
Copy link
Member

That is a great point. I'm +1 to that plan.

@BboyAkers
Copy link
Member Author

I guess the suggested direction is me possibly creating a script to append all .md files into our own llms.txt file without adding a plugin? @kriszyp @Ethan-Arrowood

@kriszyp
Copy link
Member

kriszyp commented Jan 22, 2026

creating a script to append all .md files into our own llms.txt file without adding a plugin?

Based on my reading, I would consider that to be a poor quality llms.txt since it doesn't provide any information that the docs don't already provide. The guidance link that Ethan provided points to https://www.fastht.ml/docs/llms.txt as a good example. I would think this is also potentially close to what we want: https://github.com/HarperFast/application-template/blob/77c9b7845a2e3fc1057f8c53c61cab4621f1b23e/AGENTS.md
This actually provides real LLM-specific guidance, not just the info that is already there.

@BboyAkers
Copy link
Member Author

BboyAkers commented Jan 22, 2026

Closing and creating a different PR related to the suggestions! Thanks for the feedback ya'll!

@BboyAkers BboyAkers closed this Jan 22, 2026
@github-actions
Copy link

🧹 Preview Cleanup

The preview deployment for this PR has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants