Here are the six things I look for first when I audit a newly built site.
| Pattern | What you'll see | How to fix it |
|---|---|---|
| 1. Heading hierarchy | Multiple H1s on a page, H3s where H2s should be | Audit and rebuild the H tag structure |
| 2. Schema missing or disconnected | No schema at all, or isolated tags that aren't connected | Add a connected @graph schema |
| 3. Empty HTML to crawlers | View Source shows mostly empty divs and JavaScript | Switch to server-side rendering or static generation |
| 4. Same page at multiple URLs | www and non-www both load the same page | Pick a canonical version, 301 redirect the rest |
| 5. Orphan pages | Pages exist that no other page links to | Link from contextually relevant content |
| 6. Migration didn't preserve URLs | Old URLs 404 or redirect to the homepage | Map every old URL to its closest match with a 301 |
Pattern 1The heading hierarchy is wrong
A client's site I started auditing recently had H3 tags where H2 tags should be. Some pages had multiple H1s. The site was built by a designer who made it look great, and visually it works. To a crawler, it's confused.
Heading hierarchy is one of the simplest, most ignored signals on the web. It tells search engines and AI systems what the page is about and how the content is organized. Multiple H1s send a search engine two contradictory signals about the page's primary topic. H3s standing in for H2s break the topical chain and make the page look like a fragment instead of a complete answer.
I've been cleaning these up by hand, page by page, in WordPress. The fix is mechanical and invisible to anyone reading the page. The signal it sends is loud.
Pattern 2Schema markup is missing or disconnected
Schema is the structured data that tells search engines and AI systems what kind of business you are, what services you offer, and how the pieces of your site relate to each other. Without it, those systems have to guess from context. If your brand name has a common-noun meaning, you're going to lose that guess.
By far the most common version I see is no schema at all. None on the homepage, none on service pages, none anywhere. The site goes live in this state and usually stays that way until someone audits it and flags the gap.
When schema does exist on a newly built site, it's usually isolated tags in the head of each page, none of them connected to each other. The right pattern is a connected @graph: Organization, WebSite, WebPage, Article, FAQPage, all referencing each other through @id values. That's what tells an AI system that the article on this URL was written by this person who works for this organization, which is located here, and offers these services. Without the connections, you have data without relationships, and the systems that decide whether to surface you for a buyer's question are working from a blank slate.
There's a related signal worth knowing about. Schema talks to AI through the page itself. There's also a file called llms.txt that gives AI crawlers a curated index of what's worth reading on your site. Most newly built sites don't have one. (What Is llms.txt?)
Pattern 3The HTML is empty to crawlers
This one is the vibe-coding signature, but it isn't unique to vibe coding. Plenty of dev-shop sites built on React, Vue, or Next.js without server-side rendering ship the same near-empty HTML to crawlers.
Try this on your own site. Right-click anywhere on your homepage and choose View Page Source. If what you see is mostly empty divs and a JavaScript bundle that assembles the content in the browser, that's what crawlers see when they visit. Google can sometimes work through this, slowly. AI crawlers usually don't run JavaScript at all. If your content lives in JavaScript, you're invisible to ChatGPT, Perplexity, Gemini, and the AI Mode answers in Google.
The fix is server-side rendering or static generation, depending on the framework. The audit step is a thirty-second View Source check anyone can do.
Pattern 4The same page is live at multiple URLs
The version I see most often is www and non-www both loading. Type the URL with www, the page loads. Type it without, the page loads. Both versions are reachable, both get crawled, and search engines treat them as duplicate content. Whatever ranking signals each version earns get split across the two, and neither one ranks as well as a unified version would.
There are other variations of the same problem. Trailing slash and no-slash both resolving. HTTP and HTTPS both serving. The same page reachable at /services and /services/. Each one looks like duplicate content to a crawler.
The fix is to pick one canonical version, 301 redirect everything else to it, and add canonical tags to every page that point back to itself. (What Is a 301 Redirect?)
Pattern 5Orphan pages exist that nothing links to
Internal linking is what tells search engines and AI crawlers how your site fits together. Pages that nothing links to are usually called orphan pages, and they're often functionally invisible. The crawler can't find them by following links because nothing points there. They might be in the sitemap. They might still get indexed eventually. But they aren't building any of the relational authority that connected pages do.
This usually happens when content gets added in batches and the linking step never gets done. (What Are Orphan Pages?) Every page on your site should be linked from at least one other page that makes contextual sense. Not from the global nav. From the body of related content where the link belongs.
Pattern 6The migration didn't preserve what came before
If the new site replaced an old one, and most do, the question is what happened to the old URLs. If they 404, every backlink and every bit of search authority pointing to those old pages drops to the floor. If they 302 redirect (which means temporary), search engines hold onto the old version. If they were never mapped at all, you're starting from zero on a domain that didn't have to.
Every old URL needs a 301 redirect to the closest matching new URL. Not the homepage. The closest match. A blog post about email deliverability redirects to the new blog post about email deliverability, not to the homepage. (What Is a 301 Redirect?)
Most migrations I've audited skip this step or do it badly, and the founder is left wondering why their organic traffic fell off a cliff after the redesign.
Why these patterns persist
The patterns aren't exotic. Most of them are twenty-year-old SEO hygiene with a couple of newer additions for AI search. They persist because the people building most websites are good at building websites. Making a website findable is a different job. It requires asking different questions during the build, looking at the site through a different lens, and knowing what to check before launch and after.
If your designer or developer didn't ask you about heading hierarchy, schema, canonical URLs, or what would happen to your old URLs during the migration, that isn't a failing on their part. It isn't their job. It's the job of whoever in the room is responsible for whether the site can be found in six months when somebody who's never heard of you searches for what you do.
Most rooms don't have that person. That's how these patterns end up live on the internet.
What to do if you've already built a site
Run an audit. There are free tools that will surface most of the technical issues in an afternoon. View the page source on your homepage. Check that one URL serves your homepage and the other variations redirect. Look at your sitemap and ask whether every page on it has at least one inbound internal link. If you migrated from an older site, dig out the old sitemap and spot-check that the old URLs redirect somewhere relevant.
If you find things, fix them. If you find a lot of things, that's where I come in. Sometimes a single short engagement is enough to sort out everything in this list. Sometimes it surfaces deeper questions about content, AI visibility, or topical authority that take longer to work through. Either way, you'll know.
A live site is not the same as a findable site. The first one is a deliverable. The second one is a system you maintain.
Frequently asked questions
It depends on what state the site went live in. A site with strong technical foundations, schema, internal linking, and solid content can start showing up in search within weeks. A site with the patterns above can sit on the internet for months without ever being properly indexed, and even then, only for terms it accidentally matches.
Almost all of them can be fixed incrementally. Heading hierarchy, schema, canonical tags, and internal linking are page-by-page changes. Server-side rendering is the bigger lift depending on the framework, but it usually doesn't require a full rebuild. Migration redirects are the one place where the longer you wait, the more authority you lose, so handle those first if you've recently changed domains or URLs.
A live site is one that loads in a browser. A findable site is one that search engines can crawl, AI systems can interpret, and buyers can discover when they search for what you do. Most websites are live without being findable, and the gap usually doesn't show up until traffic fails to materialize.
A free site audit tool will surface most of them in an afternoon. View the page source on your homepage to check whether your content is in the HTML or rendered by JavaScript. Check whether www and non-www both load. Look at your sitemap and check whether every page has at least one inbound internal link. (What Are Orphan Pages?) If you migrated from an older site, spot-check that the old URLs redirect somewhere relevant.
They can build the site. They build what you ask for. The challenge is knowing what to ask for, which is why AI-built sites often ship without schema, with empty HTML, or with the heading hierarchy in the wrong order. It isn't the AI's fault. It's a gap in the brief.