Search Off the Record

Google

  • Analysing Robots.txt at scale with HTTP Archive and BigQuery - transcript
    23 April 2026, 2:05 pm
  • 27 minutes 40 seconds
    Analysing Robots.txt at scale with HTTP Archive and BigQuery

    In this episode of Search Off the Record, Martin and Gary turn a simple robots.txt question into a data‑driven deep dive using HTTP Archive, WebPageTest, custom JavaScript metrics, and BigQuery. They explore how millions of real robots.txt files are actually written in 2025–2026, which directives and user‑agents are most common, and what that means for modern crawling and AI bots.

    Perfect for beginner to mid‑level developers and SEOs, you'll learn how large‑scale web measurement works (HTTP Archive, Chrome UX Report, Web Almanac), and how to turn raw crawl data into actionable SEO insights. Subscribe for more candid conversations about crawling, indexing, and the data behind how Google Search and the web really work.

    Resources:

    Web Almanac → https://almanac.httparchive.org/en/2025/ Robotstxt custom metric for the HTTP Archive → https://github.com/HTTPArchive/custom-metrics/pull/191 robots.txt parser change → https://github.com/google/robotstxt/commit/4af32e54b715442bb04cd0470e99192f0ffb9792#commitcomment-178586774

    Episode transcript → https://goo.gle/sotr108-transcript

    Listen to more Search Off the Record → https://goo.gle/sotr-yt Subscribe to Google Search Channel → https://goo.gle/SearchCentral

    Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

    #SOTRpodcast #SEO #GoogleSearch

    Speakers: Martin Splitt, Gary Illyes

    23 April 2026, 2:01 pm
  • Are websites getting "fat"? Page weight, HTML size & Googlebot limits explained - transcript
    30 March 2026, 12:18 pm
  • 32 minutes 12 seconds
    Are websites getting "fat"? Page weight, HTML size & Googlebot limits explained

    In this episode of Search Off the Record, Gary and Martin dig into what "page size" and "page weight" actually mean for developers, users, and search engines.

    They discuss exploding web page sizes: median mobile homepages hit 2.3 MB in 2025 Web Almanac (up 3x from 2015), key insights for developers on page weight definitions, Googlebot's crawl limits, HTML bloat from structured data/images, and why size still hurts UX on slow connections despite faster networks.

    If you build or maintain websites, this conversation will help you rethink how much data your pages ship, where bloat really comes from, and why page weight still matters even as connections get faster.

    Resources: ​Web Almanac → https://almanac.httparchive.org/en/2025/ HTML living standard → https://html.spec.whatwg.org/multipage/ How page speed helps with conversions → https://www.thinkwithgoogle.com/marketing-strategies/app-and-mobile/mobile-page-speed-data/

    Episode transcript → https://goo.gle/sotr106-transcript

    Listen to more Search Off the Record → https://goo.gle/sotr-yt Subscribe to Google Search Channel → https://goo.gle/SearchCentral

    Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

    #SOTRpodcast #SEO #GoogleSearch

    Speakers: Martin Splitt, Gary Illyes

    30 March 2026, 11:59 am
  • Google crawlers behind the scenes - transcript
    12 March 2026, 3:08 pm
  • 25 minutes 7 seconds
    Google crawlers behind the scenes

    Developers often talk about Googlebot as if it were a single program you could just run as "googlebot.exe", but that is not how Google's crawling actually works. In this episode of Search Off the Record, Martin and Gary from the Search Relations team unpack how Google's crawling infrastructure is really built and operated.​ They cover why "Googlebot" is a misnomer and how it relates to a central crawling software-as-a-service used by many Google products​, how crawl behavior is controlled centrally to avoid overwhelming sites (throttling, handling 503s, and "don't break the internet" safeguards)​ and more! If you build for the web, work on SEO, or just want a more accurate mental model of how Google crawls pages, this behind‑the‑scenes discussion is for you.

    Resources: ​Crawlers → https://developer.google.com/crawling

    Episode transcript → https://goo.gle/sotr107-transcript

    Listen to more Search Off the Record → https://goo.gle/sotr-yt

    Subscribe to Google Search Channel → https://goo.gle/SearchCentral

    Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

    #SOTRpodcast #SEO #GoogleSearch

    Speakers: Martin Splitt, Gary Illyes

    12 March 2026, 3:07 pm
  • How Browsers Really Parse HTML (and What That Means for SEO) - transcript
    26 February 2026, 6:59 pm
  • 32 minutes 32 seconds
    How Browsers Really Parse HTML (and What That Means for SEO)

    Martin and Gary unpack how HTML parsing really works, why the HTML standard is so lenient, and how messy markup can silently break key SEO signals like hreflang and rel=canonical. They revisit validators and cross‑browser hacks from the Netscape/IE days, and discuss whether semantic HTML and strict validity truly matter for search. You'll also hear when link hints like preload, prefetch, and DNS prefetch help performance (and indirectly SEO), and where meta and link tags really belong.

    ​ Resources:

    HTML Living Standard → https://html.spec.whatwg.org/

    Episode transcript → https://goo.gle/sotr105-transcript

    Listen to more Search Off the Record → https://goo.gle/sotr-yt Subscribe to Google Search Channel → https://goo.gle/SearchCentral

    Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

    #SOTRpodcast #SEO #GoogleSearch

    Speakers: Martin Splitt, Gary Illyes

    26 February 2026, 6:49 pm
  • Do You Still Need a Website in 2026? (Transcript)
    12 February 2026, 8:47 pm
  • 28 minutes 27 seconds
    Do You Still Need a Website in 2026?

    In this episode of Search Off the Record, Martin and Gary from the Google Search Relations team tackle a deceptively simple question: do you still need a website in 2026? Starting from the recurring industry claim that "the web is dead," they explore how the web has evolved through the rise of apps, AI chatbots, and social platforms, and why the answer almost always ends up being "it depends." Tune in for an engaging discussion on how websites remain relevant and what it means for content creation and discovery.

    Episode transcript → https://goo.gle/sotr103-transcript

    Listen to more Search Off the Record → https://goo.gle/sotr-yt

    Subscribe to Google Search Channel → https://goo.gle/SearchCentral

    Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

    #SOTRpodcast #SEO #GoogleSearch

    Speakers: Martin Splitt, Gary Illyes

    12 February 2026, 8:42 pm
  • Crawling Challenges: What the 2025 Year-End Report Tells Us - Transcript
    3 February 2026, 9:58 am
  • More Episodes? Get the App