The Most Important Elements to Check in a Technical SEO Audit

When I run a technical SEO audit, I follow my checklist from top to bottom, because the truth is, most websites have the same problems. The checklist just makes sure I don’t miss anything obvious because the obvious stuff is usually what’s breaking the site.

Not every audit uncovers some exotic bug or unique server misconfiguration. Most of the time, it’s robots.txt blocking something it shouldn’t, duplicate content hiding in the CMS, or canonicals pointing to the wrong URLs. So that’s what I’m covering here.

Crawlability Is Where Most Problems Start

Google can’t rank pages it can’t crawl. That’s basic, but it’s also the first thing I check—because if crawlability is broken, everything else is pointless.

The robots.txt file is usually the culprit. I’ve seen sites accidentally block CSS or JavaScript files, which prevents Google from rendering the page properly. 

Another thing I look for: are there pages disallowed in robots.txt that are still showing up in Google’s index? This happens when external links point to those pages, and Google indexes them even though it can’t crawl them..

I also review crawl stats in Search Console. If Google is requesting hundreds of 404s or hitting redirect chains over and over, that’s crawl budget being wasted on junk instead of pages that should be indexed. For bigger sites, this becomes a real bottleneck.

Indexability Problems Show Up Constantly

Even when Google crawls a page, that doesn’t mean it indexes it. There are a bunch of signals that can stop a page from making it into the index, and canonical tags are one of the biggest. 

If your canonical tags point to the wrong URL, or if there are conflicting signals between the canonical and the robots meta tag, Google gets confused. I check this in my crawling tool and cross-check with Search Console under Pages > Not indexed > Alternate page with proper canonical tag.

Another red flag: pages that are both disallowed in robots.txt AND noindexed. That’s a conflicting signal that usually means someone didn’t understand what they were doing when they set things up.

Redirect chains are another common issue. When a URL redirects to another URL, which then redirects again, you’re slowing down crawlers and potentially losing link equity. A single 301 redirect is fine—anything beyond that needs to be fixed.

Then there’s the XML sitemap. It needs to exist, be linked from robots.txt, and only contain URLs that return 200 status codes and aren’t blocked from indexing. If your sitemap lists 5,000 URLs but Google only indexes 2,000, that’s worth investigating.

I also check whether there’s a discrepancy between what’s in the sitemap and what’s actually indexed. Sometimes the sitemap includes pages that shouldn’t be there—like paginated URLs, filtered results, or internal search pages.

Internal Linking Gets Ignored But Shouldn’t

Internal linking is one of the easiest ways to signal which pages matter, but most sites do it poorly. I look for orphan pages—pages with zero internal links pointing to them. Google might find them through the sitemap, but they’re not getting any link equity or contextual signals from the rest of the site.

Another issue: are internal links using descriptive anchor text? Generic phrases like “click here” or “read more” don’t help Google understand what the linked page is about. Use keyword-rich, natural anchor text that describes the destination page.

One more thing I see constantly—internal links with UTM parameters. Those are meant for external campaigns, not internal navigation, and they can cause tracking issues and even create duplicate content problems.

Structured Data Is Usually Missing or Misconfigured

Structured data doesn’t fix a broken site, but it does help Google understand your content better. When implemented correctly, it can get you rich snippets, which increases click-through rates.​

The problem I see most often isn’t syntax errors—it’s that structured data is missing entirely. Sites just don’t have it implemented, or they’re only using it on a handful of pages.

When structured data is there, the issue is usually wrong data being used. For example, a CMS might be configured to pull the wrong URL for the schema, so every page’s structured data points to the homepage instead of the actual page URL. Or the wrong image gets pulled in, or the product price is formatted incorrectly.

I check this using Google’s Rich Results Test or Schema Markup Validator. Common mistakes include missing required fields or using the wrong value type—like using text where a URL is expected.

I also look for structured data opportunities that aren’t being used. For example, if you’re running an e-commerce site and not using Product schema, you’re leaving money on the table.

Duplicate Content and Thin Pages Are Everywhere

The real problem with duplicate content is that it’s usually invisible—generated by the CMS without anyone realizing it. Maybe the CMS creates category pages, tag pages, and author archive pages that all list the same blog posts. Or maybe there are filter and sort URLs that create dozens of near-identical pages.

Another common issue: URL parameters. A CMS might allow the same product page to be accessed at multiple URLs—one with a tracking parameter, one with a sort parameter, one with a filter parameter. Each of those is a separate URL to Google, even though the content is identical.

I also look for thin content pages—stuff like empty category pages, tag pages with two blog posts, or auto-generated internal search results that got indexed. These are what I call “zombie pages”. They don’t drive traffic, they don’t help users, and they waste crawl budget.

The fix is simple: either add real content to those pages, consolidate them, or delete them and set up 301 redirects.

Another duplicate content issue: www vs non-www, and HTTP vs HTTPS. If both versions of your site are accessible and not properly redirected, that’s duplicate content. I check this by making sure one version 301 redirects to the other.

Same goes for trailing slash issues. Some sites serve the same page at both /page and /page/—that’s duplicate content unless one version canonicals or redirects to the other.

Quick Checklist

Here’s what I focus on in every technical SEO audit:

  • Check robots.txt for accidental blocks and crawl stat issues
  • Find pages blocked in robots.txt but still indexed
  • Fix canonical tag issues and redirect chains
  • Audit XML sitemap accuracy and discrepancies
  • Find and fix orphan pages and crawl depth problems
  • Check for internal links with UTM parameters
  • Look for missing or misconfigured structured data
  • Hunt down CMS-generated duplicate content and thin pages
  • Fix www/non-www and HTTP/HTTPS duplication
  • Remove or consolidate tag pages, filter URLs, and internal search results

7If you want to see some of these concepts in action, I recommend checking out this video from fatjoe on YouTube: Technical SEO: What Is It & How To Fix The Most Common Issues. They walk through the basics with clear examples and real-world fixes.

For more context on when to run these checks, take a look at my post on when to perform a technical SEO audit. And if you’re not sure what a technical SEO audit even is, I’ve got a breakdown here: what is a technical SEO audit.

If you’d like me to run a full technical audit on your site and fix these issues for you, reach out and let’s talk.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *