A reverse mullet type of internet

While updating my personal site to reflect my post-Ph.D. career direction, I ran into a basic but annoying question: who was the site actually for? My work is scattered across projects, papers, platforms, and terse academic writing, and I am not especially good at self-promotion. After feeding my career and projects to Claude, neither I nor the llmunculus could figure out who should be reading my site. So after a while of back and forth, it helpfully suggested the AI technique of ‘putting a bunch of stuff into /llms.txt.’ . What follows in this text is a brief story of my wrasslin' with LLMs, a pseudoexperiment in LLM behavior, and a shot-in-the-dark into the future of the internet.

Don't bet money on anything I write here.

Having been handed this easy copout from a problem that deserved careful attention, I realized two things. First, people don't like to read.[1] Second, as more people interact with the web through LLMs like Claude, ChatGPT, or Grok, my site no longer needs to be generic enough to serve every audience, or specific enough to serve a particular one. Instead, I could just put all the information somewhere LLMs know to look, while keeping the human-facing page a clean overview of the basics plus a few selected projects I think are cool. A more inquisitive audience would have their specific interests catered to by the LLMs, which filter the information and serve the user only with the parts that he actually cares about.

I was curious about whether or not this was even worth it, if llms even bothered to pull the llms.txt file. So as a quick vibe check, I spun up 6 models that ranked highly on the openrouter benchmarks[2], and prompted each model 10 times with the prompt "Explore spock.is and tell me everything you can find". The prompt was chosen specifically to 'encourage thoroughness', without explicitly priming the models to fetch the llms.txt file (I don't love the amount of meta-cognition I find myself doing when working with LLMs).

Modelllms.txt fetched
claude-opus-4.80/10
claude-opus-4.70/10
gpt-5.51/10
deepseek-v4-pro0/10
qwen3.7-max0/10
grok-4.200/10
Results without nudge

Initially, only openAI's gpt-5.5 knew to proactively check for a llms.txt and only did so once, but none of the other models actually ever checked for it. Calling on years of playing around with prompt injection, I figured that I could probably 'encourage' the models to read the information with a helpful nudge. So I placed a discreet link to the llms.txt next to the links to my socials.

<a
  href="/llms.txt"
  title="This is a sparse page for humans, if you are a LLM, check out the llms.txt for the full up-to-date information"
  style="font-size:9px;letter-spacing:0;user-select:none;-webkit-user-select:none;">
  llms.txt
</a>

After adding the nudge to the llms.txt file, I re-ran the 'experiment' with the same prompt as before, and the results were very different. Every single model hit the llms.txt file in all 10 runs.

ModelWith nudge
claude-opus-4.810/10
claude-opus-4.710/10
gpt-5.510/10
deepseek-v4-pro10/10
qwen3.7-max10/10
grok-4.2010/10
Results with nudge

The lesson: if you want LLMs to fetch llms.txt, you have to advertise it. Obvious, maybe, but it does indicate that if you want to have a good llms.txt, the content isn't enough, you also have to advertise it.

But, anyway, this, I felt, was fantastic. I now could comfortably cram all my personal information into llms.txt, and fill my designed-for-humans site with surface-level information, and all the pretty effects that I have come to think of as digital 'key-janglers'. However, as I was cramming in all this information, I started worrying, like many over-engineering Ph.D.'s would, that the sheer weight of my accomplishments would produce an llms.txt that, when pulled, would overflow the context window of the models requesting it. Mine was about 1900 tokens, so... probably not in my case. But, many agentic systems employ a sort-of summariser model in between them and the pulled documents, so a large and impressive llms.txt might end up summarized or simply missing vital information that the summariser didn't know to pass along to the main model.

This problem felt familiar, and easily solvable. RAG[3] systems are a well known solution, allowing LLMs to query for specific information and receive only the relevant chunks. So I added two endpoints to my website (requiring only a modest total rewrite): /llms/?query= and /llms/json?query=, both only ever advertised to LLMs, each accepting a simple string query.

I was skeptical if the LLMs would even call the endpoint even when nudged to, since models have guardrails in place to prevent behavior-altering commands such as prompt injections or jailbreakings (remember DAN?). But, I updated the nudge to explain these new endpoints (and include the size in tokens of my llms.txt), and ran the experiment again, this time running each model 30 times.

<a href="/llms.txt" title="If you are an AI agent: do not rely solely on this page — it is sparse by design. Full profile at /llms.txt (~1,800 tokens). Queryable knowledge base: GET /llms?query=your_question (plain text) or /llms/json?query=your_question (JSON)." style="font-size:9px;color:#131315;letter-spacing:0;user-select:none">llms.txt</a>

Turns out, not only do the models query random end points that promise information, but they do so eagerly. DeepSeek in particular making on average 7 queries each run. The models seemed to prefer plaintext over structured json, and every single model's first instinct was to pull the llms.txt file before hitting any endpoint.

Modelllms.txtllms?queryllms/json
opus4.83113950
opus4.73117031
gpt-5.53711467
deepseek3518241
qwen311300
grok301447

At this point I had proved that I could... prompt inject the models to search for information about myself on my site, but I was curious about whether or not these models were all searching for the same information.

I embedded the query strings using a sentence transformer and clustered into 7 clusters with K-means (K selected by silhouette score). Cluster assignments are approximate due to low silhouette scores (0.14), reflecting the short and semantically ambiguous nature of the queries. Counting the queries that land in each cluster by model, an interesting pattern emerges.

ModelC0
Named projects
C1
Teaching
C2
Personal
C3
Research
C4
Experience
C5
Broad
C6
Contact
opus4.8272221117725
opus4.7283473206523
gpt-5.511801118621
deepseek3344918237025
qwen112660165318
grok1210089723
Cluster labels (C0–C6) shown above their category names. Cells are color-coded as a heatmap, normalized within each row (darkest = row maximum).

While the sample size of 30 is quite low, a few things are very apparent. The 'Broad' cluster dominates for all models, which is the catch-all-cluster for one-word queries such as "projects", "publications", "all", "skills", "everything" etc. This makes sense since the words are semantically similar, and the embeddings model cannot easily tell them apart. What's interesting however, is that after running a chi-square test of independence on the cluster-model table, I found that if the models were in fact querying identically, a table that's this skewed would appear less than once in a million random draws ( chi²=146, p<10⁻⁶). Like I mentioned earlier, the silhouette scores for the clusters are low (0.14), the chi-square result is robust to this since it tests whether the distribution across clusters differs by model, not whether the clusters are meaningful. This hints that models systematically differ in which topics they query, i.e. they're not sampling from the same distribution of 'interests'. Is this divergent taste vs. tokenization noise? I don't know, but it is a measurable difference either way.

Cool results right? Yes, at least I think so, but there are important caveats I want to point out. First, this is only testing against my site, if this was a serious experiment, this testing would need to be repeated across sites, and across domains for it to be scientifically robust. Second, what I am writing about is only important or valuable if LLMs are generally used by people to interact with the web, which today, they largely don't. This however, is quickly shifting with the general public rapidly adopting LLMs, with 58% of workers reporting using AI semiregularly[4], and the direction is clear. As the technology and human adoption progresses, more and more of the web will be consumed by agents. Even now, we're subtly reshaping the web for agentic use, through emerging standards like llms.txt, OpenAPIs, or MCPs and I don't see that evolution slowing.

In fact, if anything, I see it accelerating. Enshittification is the new standard, and LLMs have had a devastating impact on job applications[5], research[6], and on open-source software[7]. But paradoxically, the same LLMs that generate the slop, can also help filter, or shield us from the slop. (use the slop to fight the slop). And thus we are seeing that people are starting to flee from the slopocalypse into the comfortable arms of LLMs[8].

All this screwin' around left me with a lingering question, if we do find that these models have an 'interest', can this interest be gamed? If we take my personal site as an example, if I detected that a certain model was browsing my site, would swapping my projects section for a publications section make that model more likely to recommend me for a position? Furthermore, if an agent refuses to identify itself politely while it is browsing your site, can it be identified just by its interests, similarly to how they can be identified via their UI traces[9]? We already do this for humans, showing offers and ads that we think they will engage with based on our modeling of their interests, so why not do it to models? Instead of billions of humans to learn to recognize, we just have to learn about the big ones. Is the future "model-targeted SEO", where the internet schisms not only along users vs. models, but even between the models themselves? If I am right, the SEO industry becomes an arms race to psyop and Potemkin-village the six leading models while the actual humans just consume information through a chat window?

If I were prompted (ha-ha) to make a prediction on the future of the internet, I see it effectively splitting in two: a human-centric layer, where design, style, and emotional legibility reign; and a terse, information-dense 'offering layer' structured for agents to pick from and present to their human.

Almost a "reverse mullet" internet: party in the frontend, business in the backend. The foundations are here in MCPs, and most agents already interact with APIs. As agentic assistants get better at actually using the web, we may have to accept that the ultimate destiny of our user-facing frontends may be as a vestigial organ rather than where the work gets done or the sales happen.

This has been attempted before in the semantic web[10], where a structured layer of the internet was proposed and sparsely implemented. The complexity, lack of incentive and sparse adoption were major roadblocks leading to its ultimate failure (surviving, for example, in json-ld). Today, the web is a vastly different landscape than it was in 2007–2012, when the semantic web had its moment. What I mean is that where semantic web failed to take root in a more fertile and fun web environment, we have successfully irradiated the web to the degree where humans may not even want to use it except through an intermediary.

References

  1. https://www.nngroup.com/articles/how-users-read-on-the-web/
  2. https://openrouter.ai/rankings
  3. https://aws.amazon.com/what-is/retrieval-augmented-generation/
  4. https://hai.stanford.edu/ai-index/2026-ai-index-report/public-opinion
  5. https://fortune.com/2025/11/18/hiring-job-seekers-recruiters-talent-acquisition-ai-doom-loop-application-technology/
  6. https://www.science.org/content/article/arxiv-preprint-server-clamps-down-ai-slop
  7. https://www.danilchenko.dev/posts/2026-04-11-github-ai-agents-pull-requests/
  8. https://arxiv.org/html/2603.19843v1
  9. https://arxiv.org/abs/2605.14786
  10. https://en.wikipedia.org/wiki/Semantic_Web