Published 2026-05-21 · Honolulu

We killed our own thesis.

NeverRanked sold a product that did not work. This is the full story, written by the person who sold it.

The thesis

Until May 2026, NeverRanked sold a small JavaScript snippet. You pasted it once on your website. We claimed it made the structured-data signals on your pages legible to AI answer engines (ChatGPT, Perplexity, Google AI Overviews, Claude) and that being legible was the thing that earned citations in their answers.

It was a clean pitch. It had a clear mechanism. It had a customer who could be referenced by name. It had paying agencies considering reseller tiers. The framing was: SEO is the wrong dial to turn for AI search, AEO is the right one, and we ship the AEO infrastructure faster than any agency can build it in-house.

The thesis was wrong. We can prove it, because we tested it.

The kill test

Before testing, we wrote down what would count as a pass and what would count as a fail. This matters. The single most common way founders fool themselves is moving the goalposts after the data lands. So the criterion was locked, in writing, in a file we hash-stamped:

Hypothesis: shipping our snippet on neverranked.com drives
            AI citation share for AEO-category queries on
            Perplexity within seven days.

Test: 18 hash-locked queries, three repetitions each, on
      Perplexity sonar API. Domain: neverranked.com.
      Snippet deployed and verified rendering.

Pass:   >= 30% of slots cite neverranked.com.
Fail:   < 10% of slots cite neverranked.com.
Gray:   10-30% triggers a follow-up before any decision.

The pre-registration file lives in the research repo at dryrun/schema-causal/PRE-REGISTRATION.md. Hash: 87bd4abefb331bc9. That hash is in the file. The same hash printed by the runner on the day the test ran. You can read the inputs without trusting the outputs.

We ran the test on 2026-05-19. The result was zero.

Not "below the pass bar." Not "in the gray zone." Not "weak signal that needs more data." Zero of 409 citation slots cited neverranked.com. The snippet was on the page. The crawlers had been to the page. The structured data was rendered in the DOM. None of it produced a single citation.

Why it failed

The failure had a single technical cause we should have tested first. The LLM crawlers that AI engines use to gather pages for answer-grounding (GPTBot, ClaudeBot, PerplexityBot) do not execute JavaScript. They fetch the raw HTML. Our snippet ran in the browser, after the page loaded, to inject JSON-LD structured data into the DOM. That injection was invisible to every crawler that mattered. The structured data we shipped, the central mechanism the entire product was built on, was present only for human visitors and search engines that already had it indexed via Google.

If we had tested this single assumption six months earlier, the product would never have been built. The discipline failure was not technical. It was that we let conviction outrun the test.

What the product was actually doing

The named customer reference we had been using publicly was Hawaii Theatre Center. The story we told about HTC was that their NeverRanked AEO score moved from 45 to 95 in ten days after we deployed the snippet, and that Perplexity cited them on 14 of 19 tracked queries the same week. Both numbers were real measurements. The causal link between them and our work was not.

The 45-to-95 score was a measurement of structured-data presence in the DOM, which the snippet did move. The 14-of-19 Perplexity citations was authority-driven, pre-existing behavior we observed but did not cause. Both are useful diagnostics. Neither one was evidence of the product working the way we claimed.

The diagnostic work for HTC was real, valuable, and the customer kept it. We surfaced an expired Charity Navigator profile that had not been updated since 2023, a BBB profile last touched in 1999, missing presence on Bing Business Profile, misconfigured authority backlinks, and we collaborated with their team on meta description rewrites. None of that required the snippet. All of it is the kind of gap a standard SEO scan misses and a forensic measurement engagement surfaces.

What we did when the test came back zero

What NeverRanked is now

A research engagement that measures what AI answer engines actually cite for a category. The deliverable is a forensic memo plus a prepped punch list. The customer's team or their agency executes the work. We do not. That separation is structural.

We watch seven AI surfaces every day. Five citation-grade engines that search the live web (Perplexity, ChatGPT search, Gemini grounded, Microsoft Copilot via Bing, Google AI Overviews). Two model-knowledge engines that answer from training data alone (Claude, Gemma). Both layers measure different failure modes a brand can have.

Pricing is $4,500 kickoff per category, one time. $1,500 per month per category, ongoing. Per category, not per client. There is no SaaS dashboard.

Why publish this

The honest reason is that our buyers can read. Anyone evaluating NeverRanked seriously can dig into the GitHub repo, find the pre-registration file, see the test results, and notice the retracted content in the git history. We can either own the retraction in our own voice, or wait for someone else to surface it. Owning it is better business and more honest.

The harder reason is that the category is full of vendors making the same claim we just retracted. Every AEO tool on the market promises some version of "ship our structured data and AI cites you." Some of them may eventually prove it. Most of them have not run the test. We do not believe a vendor in this space should be trusted on the citation-causation claim without a pre-registered, hash-locked, reproducible kill test against their own domain. We ran ours. We failed. We stopped selling. The honest move now is to say so.

What we changed about how we work

The discipline failure that produced the snippet product was: build first, validate the bet never. The reverse pattern is now structurally enforced. Three things in particular:

1. Pre-registration before measurement

Any claim that could become a product gets a pre-registration document first. Hypothesis, criterion, pass/fail thresholds, threats to validity. The file is written and hash-stamped before the test runs. Moving the goalposts is impossible because the goalposts are in the file.

2. The grader is fail-closed

Any prospect-facing artifact (cold email, preview page, research memo, dashboard report) is graded by a separate model against a canonical fact list before it ships. Mention something we cannot substantiate, the grader rejects the artifact and the send is held. The canonical fact list lives in the codebase. Anyone can read it.

3. Pattern-readiness has a numeric bar

The cross-category dataset our engagements feed is governed by a rule: claim a pattern in a category only when three or more usable runs in that category agree. A run with zero successful API captures does not count, even if its log is the right length. This is in dryrun/forensic/MOAT.md in the research repo. The discipline is enforced by the catalog tool, not by us remembering.

The honest current state

If you want to dig in

Return to NeverRanked