<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
  xmlns:atom="http://www.w3.org/2005/Atom"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Agnel Nieves - LLMs</title>
    <link>https://agnelnieves.com/blog/tag/llms</link>
    <description>Blog posts on LLMs by Agnel Nieves.</description>
    <language>en-US</language>
    <lastBuildDate>Fri, 15 May 2026 01:12:17 GMT</lastBuildDate>
    <atom:link href="https://agnelnieves.com/blog/tag/llms/feed.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title><![CDATA[Optimizing Your Website for AI Agents and LLMs]]></title>
      <link>https://agnelnieves.com/blog/optimizing-your-website-for-ai-agents-and-llms</link>
      <guid isPermaLink="true">https://agnelnieves.com/blog/optimizing-your-website-for-ai-agents-and-llms</guid>
      <description><![CDATA[Your website has human visitors and AI visitors. Here's how to serve both — with llms.txt, inline LLM instructions, structured data, and machine-readable feeds.]]></description>
      <content:encoded><![CDATA[<p>Your website has two audiences now. Humans, obviously. But also AI agents — LLMs that crawl, summarize, cite, and recommend your content to millions of people. If your site isn&#39;t optimized for both, you&#39;re leaving visibility on the table.</p>
<p>I just finished optimizing <a href="/">this site</a> for AI consumption, and the process revealed something interesting: most of what makes a site good for AI also makes it better for humans. Clear structure, machine-readable content, and explicit metadata benefit everyone.</p>
<p>Here&#39;s what I did and why it matters.</p>
<h2>What Are AI Agents Actually Doing with Your Site?</h2>
<p>When someone asks ChatGPT, Claude, Perplexity, or Google&#39;s AI Overview a question, those systems don&#39;t just generate answers from training data. Increasingly, they fetch and cite live web content. Your site might get:</p>
<ul>
<li><strong>Crawled for training data</strong> by bots like GPTBot, ClaudeBot, and Google-Extended</li>
<li><strong>Fetched at query time</strong> by Perplexity, ChatGPT browsing, and similar agents</li>
<li><strong>Cited as a source</strong> in AI-generated responses</li>
<li><strong>Summarized in featured snippets</strong> and AI overviews</li>
<li><strong>Navigated by autonomous agents</strong> that interact with your APIs</li>
</ul>
<p>Each of these has different needs, but they all benefit from the same foundation: structured, discoverable, machine-readable content.</p>
<h2>The llms.txt Standard</h2>
<p>The <a href="https://llmstxt.org">llms.txt spec</a> is the equivalent of <code>robots.txt</code> for AI agents. While <code>robots.txt</code> tells crawlers what they <em>can</em> access, <code>llms.txt</code> tells them what your site <em>is</em> — a structured markdown index served at your domain root.</p>
<p>The format is simple:</p>
<pre><code class="language-markdown"># Your Name or Site

&gt; A one-line summary of what this site is.

A longer description paragraph.

## Section Name

- [Link Title](https://url): Description of what&#39;s at this link
</code></pre>
<p>I implemented two variants:</p>
<ul>
<li><strong><code>/llms.txt</code></strong> — the index. A table of contents with links to all pages, blog posts, projects, social profiles, and feeds. Think of it as a menu for AI agents to browse selectively.</li>
<li><strong><code>/llms-full.txt</code></strong> — the full dump. Every blog post&#39;s complete markdown content, every project description, biographical context. For agents that want to load everything into context at once.</li>
</ul>
<p>Both are served as <code>text/plain</code> with markdown formatting. Both are generated dynamically from the same data sources that power the site, so they never go stale.</p>
<h2>Inline LLM Instructions in HTML</h2>
<p>This one comes from a <a href="https://vercel.com/blog/a-proposal-for-inline-llm-instructions-in-html">Vercel proposal</a> and it&#39;s clever: embed AI-readable instructions directly in your page&#39;s <code>&lt;head&gt;</code> using a script tag browsers ignore.</p>
<pre><code class="language-html">&lt;script type=&quot;text/llms.txt&quot;&gt;
# Your Site Name

This is the personal website of [name], a [role] based in [location].

## Site Structure
- / — Home: Description
- /blog — Blog: Description
- /about — About: Description

## Key Facts
- Name: Your Name
- Role: Your Role
- Specialties: Thing 1, Thing 2, Thing 3
&lt;/script&gt;
</code></pre>
<p>Browsers skip <code>&lt;script&gt;</code> tags with unknown types. LLMs process them. It&#39;s a zero-cost way to give every page on your site a machine-readable context block. I added one to my root layout that describes who I am, the site structure, and where to find machine-readable content.</p>
<h2>Structured Data That AI Engines Actually Use</h2>
<p><a href="https://json-ld.org/">JSON-LD</a> structured data has always been important for Google. It&#39;s now equally important for AI engines. When an LLM encounters schema.org markup, it understands the <em>semantics</em> of your content — not just the text, but what the text represents.</p>
<p>I already had structured data for my blog posts (<code>BlogPosting</code> schema with breadcrumbs). What I added was <code>CreativeWork</code> schema for my <a href="/work">portfolio projects</a>, giving each project a machine-readable identity:</p>
<pre><code class="language-json">{
  &quot;@context&quot;: &quot;https://schema.org&quot;,
  &quot;@type&quot;: &quot;CreativeWork&quot;,
  &quot;name&quot;: &quot;Project Name&quot;,
  &quot;description&quot;: &quot;What this project is&quot;,
  &quot;url&quot;: &quot;https://project-url.com&quot;,
  &quot;creator&quot;: {
    &quot;@type&quot;: &quot;Person&quot;,
    &quot;name&quot;: &quot;Your Name&quot;
  }
}
</code></pre>
<p>The more schema types you cover, the more AI engines can understand and cite your work with proper attribution.</p>
<h2>Machine-Readable Feeds</h2>
<p>RSS is great, but it&#39;s XML — not the most natural format for AI agents to parse. I added a <a href="https://www.jsonfeed.org/">JSON Feed</a> endpoint alongside my existing RSS feed:</p>
<ul>
<li><strong><code>/feed.xml</code></strong> — RSS 2.0 for traditional feed readers</li>
<li><strong><code>/feed.json</code></strong> — JSON Feed 1.1 for programmatic consumption</li>
</ul>
<p>JSON Feed is cleaner for AI agents to parse and reference. Both are registered in the site&#39;s metadata so they&#39;re auto-discoverable.</p>
<h2>Making robots.txt AI-Aware</h2>
<p>Most sites already have a <code>robots.txt</code>. The key addition is explicitly allowing AI crawlers and pointing them to your <code>llms.txt</code>:</p>
<pre><code>User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

# AI/LLM Content
# llms.txt: https://yoursite.com/llms.txt
# llms-full.txt: https://yoursite.com/llms-full.txt
</code></pre>
<p>Many sites block AI crawlers by default. If you <em>want</em> your content cited and discovered by AI, explicitly allow the major bots: <code>GPTBot</code>, <code>ChatGPT-User</code>, <code>Google-Extended</code>, <code>ClaudeBot</code>, <code>anthropic-ai</code>, <code>PerplexityBot</code>, <code>Applebot-Extended</code>, <code>Bytespider</code>, and <code>cohere-ai</code>.</p>
<h2>Why This Matters for Creators</h2>
<p>As a design engineer with 15+ years of building products, I&#39;ve watched SEO evolve from keyword stuffing to semantic web to AI-native discovery. We&#39;re at an inflection point. The sites that get cited by AI aren&#39;t necessarily the ones with the best domain authority — they&#39;re the ones with the clearest, most structured, most machine-readable content.</p>
<p>This is especially important for personal sites and portfolios. When someone asks an AI &quot;who are the best design engineers in Miami?&quot; or &quot;what&#39;s a good article about design tokens?&quot;, you want your site to be citable. That requires more than good content — it requires content that AI can <em>find</em>, <em>understand</em>, and <em>attribute</em>.</p>
<h2>The Full Stack of AI Optimization</h2>
<p>Here&#39;s the complete checklist of what I now have in place:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>What</th>
<th>Why</th>
</tr>
</thead>
<tbody><tr>
<td><code>robots.txt</code></td>
<td>Explicitly allow AI bots</td>
<td>Let them crawl</td>
</tr>
<tr>
<td><code>sitemap.xml</code></td>
<td>Dynamic sitemap with all content</td>
<td>Let them discover</td>
</tr>
<tr>
<td><code>llms.txt</code></td>
<td>Markdown index of the site</td>
<td>Let them understand structure</td>
</tr>
<tr>
<td><code>llms-full.txt</code></td>
<td>Full content in one file</td>
<td>Let them ingest everything</td>
</tr>
<tr>
<td>Inline <code>&lt;script&gt;</code></td>
<td>Page-level LLM instructions</td>
<td>Let them understand context</td>
</tr>
<tr>
<td>JSON-LD</td>
<td>Structured data on every page</td>
<td>Let them understand semantics</td>
</tr>
<tr>
<td>RSS + JSON Feed</td>
<td>Machine-readable content feeds</td>
<td>Let them subscribe</td>
</tr>
<tr>
<td>Meta tags</td>
<td>OpenGraph, Twitter, canonical</td>
<td>Let them cite accurately</td>
</tr>
</tbody></table>
<p>None of these changes affect how the site looks or feels for human visitors. They&#39;re invisible additions that make the site dramatically more useful for AI.</p>
<h2>What&#39;s Next</h2>
<p>The AI web is evolving fast. Standards like <code>llms.txt</code> are still emerging, and new patterns will appear. But the fundamentals won&#39;t change: structure your content clearly, make it discoverable, and give machines the metadata they need to understand it.</p>
<p>If you want to replicate this setup, I&#39;ve published a <a href="/guides/ai-optimization-guide.md">full implementation guide</a> with code examples for Next.js. The approach works for any framework — the concepts are universal.</p>
<hr>
<p><em>Building something and want to talk AI optimization? <a href="/connect">Let&#39;s connect</a>.</em></p>
]]></content:encoded>
      <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
      <author>agnel@agnelnieves.com (Agnel Nieves)</author>
      <dc:creator><![CDATA[Agnel Nieves]]></dc:creator>
      <category>AI</category>
      <category>SEO</category>
      <category>Web Development</category>
      <category>LLMs</category>
    </item>
  </channel>
</rss>