Discovery and selection are now two different optimization games. In the AI-commerce era, ChatGPT, Perplexity, and Gemini pre-filter the candidate set before the shopper sees it — and UCPScore's 1,741-store May 2026 benchmark shows zero stores cleared the AI-ready threshold. Stores invisible to the agent aren't ranked low; they're absent.
Key Takeaways
For two decades, the optimization problem in commerce was discovery: rank high in search, win the click, fight for above-the-fold attention. Discovery was the bottleneck because the shopper saw the candidate set — the SERP, the category page, the related-products carousel — and made the selection herself. The store's job was to be in the candidate set shown to the shopper.
The AI shopping era inverts the architecture. The agent pre-filters the candidate set before the shopper sees it. Stores that aren't structurally legible to the agent — that the agent can't confidently rank, compare, or recommend — never enter the candidate set in the first place. The shopper never sees them. Not ranked low. Absent.
The agent never shows the shopper a store it cannot parse
That's what selection vs. discovery means as a category framing. Stores still need to win discovery for the slice of human-search traffic that persists. But the larger and growing slice — agent-mediated commerce through ChatGPT, Claude, Perplexity, Gemini, and the new agent surfaces native to UCP, ACP, MCP, and AP2 — is gated on selection. And selection is gated on structural legibility: whether the agent can parse the store accurately enough to confidently include it.
"Structural legibility" in plain English: whether your product pages have machine-readable answers (in JSON-LD or structured fields) to the questions an AI agent asks before it recommends you — what does this work with, what are the alternatives, what are the trust signals. Not your copy; your structure.
Both eras still run. The agent era is the one that's growing.
Both models run in parallel right now.
Discovery model — the human is the filter. The shopper opens search, browses category, scrolls feeds. The store's job is to be in the candidate set the shopper sees. The optimization surface is SEO, ad bids, image quality, copy hooks; the success metric is click-through rate, conversion, attention. Legibility-to-machine is secondary because the machine isn't choosing.
Selection model — the agent is the filter. The agent receives shopper intent and pre-filters the candidate set. The store's job is to be in the candidate set the agent can confidently include. The optimization surface is structured attributes, compatibility metadata, substitute graphs, JSON-LD density, protocol compliance; the success metric is selection rate — the percentage of agent queries where the store enters the candidate set. Legibility-to-machine is the primary axis. Visibility flows from legibility.
Discovery still matters for the human-search slice. But discovery isn't growing — agent-mediated commerce is. And the two optimization surfaces don't collapse into each other. A store can rank #1 on Google for a query and still be absent from the agent's candidate set for the same intent. That's not a bug in the agent; it's the agent doing its job — refusing to recommend a store it can't confidently parse. We documented that exact pattern in the CogniPaws case study across eight scans.
Nine dimensions. Three carry the entire score loss.
UCPScore's rubric scores nine dimensions across 18 deterministic checks. Across all 1,741 stores scanned May 2026, the dimension averages reveal exactly where the legibility gap lives — and it's narrower and more structural than "stores need better content."
| Dimension | Ecosystem average | Status |
|---|---|---|
| freshness | 100.00 / 100 | Solved |
| visuals | 98.72 / 100 | Solved |
| functional_language | 93.28 / 100 | Strong |
| data_completeness | 91.82 / 100 | Strong |
| protocol | 69.35 / 100 | Gap |
| fulfillment | 65.39 / 100 | Gap |
| semantic_richness | 40.29 / 100 | Big gap |
| comparison_readiness | 34.58 / 100 | Big gap |
| trust | 25.75 / 100 | Biggest gap |
Trust (25.75), comparison readiness (34.58), and semantic richness (40.29) account for nearly all the score loss across the ecosystem. The other six dimensions average 86.5+. This is the moat insight: stores aren't failing on content, visuals, freshness, or functional language. They're failing on the three structural surfaces an agent needs to confidently include them — trust signals, comparison-ready attributes, and semantic-density-per-product.
99.94% of stores fail the same three checks. Universal.
Three specific checks from the rubric's 18 fail in 1,740 of 1,741 stores. Enterprise, mid-market, small — all three tiers, 99%+ fail rate. Beauty, apparel, food & beverage, home, pet health — all five verticals, 99%+ fail rate. US and non-US — both locales, 99%+ fail rate. This is not a "some stores are behind" problem. It's a structural gap the entire scanned ecosystem shares.
1. ATTR_COMPATIBILITY_PRESENT — compatibility metadata is missing
Stores don't tell the agent what their products work with. No structured compatibility list, no fit/spec metadata, no "best-for" tags an agent can extract. When a shopper asks the agent "will this work with my X?" — there's nothing to extract. The agent has to guess from prose, and usually doesn't. Rubric dimension: semantic richness (40.29 / 100 ecosystem average).
2. ATTR_MACHINE_READABLE_DENSITY — attributes exist in prose, not in structure
A product page can have a 600-word description that mentions everything the agent needs — color, material, weight, occasion, dimensions, allergen flags — but if none of those facts live in structured attributes (JSON-LD, microdata, Shopify metafields), the agent extracts them from natural language and frequently gets them wrong. Density is the gap. Rubric dimension: data completeness (91.82 / 100 ecosystem average — but the gap is on the structured side).
3. ATTR_SUBSTITUTES_PRESENT — substitute graphs are absent
When a product is out of stock or doesn't fit a shopper's constraint, the agent has no graceful fallback. There's no substitute graph — no "if this doesn't fit, try X." The agent sends the shopper away. The conversion opportunity vanishes silently. Rubric dimension: comparison readiness (34.58 / 100 ecosystem average).
Twelve DTC brands. $1B+ aggregate raised. None cleared 75.
Receipt-grade citations are the proof set. Every brand below was scanned in the same May 2026 window with the same rubric SHA. Anyone can re-run the public scanner against the same domains and produce the same scores. The pattern is striking — even at the enterprise tier, with hundreds of millions in cumulative venture capital, no named brand clears the 75 AI-ready threshold.
| Brand | Score | Vertical | Tier |
|---|---|---|---|
| Hyperice | 70 | Recovery tech | Enterprise |
| Gymshark | 68 | Athleisure DTC | Enterprise |
| Kith | 68 | Streetwear | Enterprise |
| Allbirds | 66 | Footwear (public co.) | Mid-market |
| Liquid I.V. | 66 | Hydration CPG | Enterprise |
| Fashion Nova | 63 | Fast fashion | Enterprise |
| Brooklinen | 62 | Home / bedding | Enterprise |
| ColourPop | 62 | Cosmetics | Enterprise |
| Liquid Death | 60 | Beverage / CPG | Enterprise |
| Casper | 59 | Sleep / mattress | Enterprise |
| Feastables | 59 | Snacks / CPG | Enterprise |
| Olipop | 57 | Beverage / CPG | Mid-market |
Catalog scale doesn't buy AI-readiness. Allbirds (public company, ~50 SKUs) sits at 66. Casper (mattress brand with $1B+ peak valuation) sits at 59. Olipop (mid-market beverage darling) sits at 57. The brands at the top of the named set — Hyperice 70, Gymshark 68, Kith 68 — are all enterprise athleisure/recovery, where compatibility metadata (fit, size, recovery use) is already partly category-trained. But none cleared 75. The shift didn't spare anyone.
How we know
Every score in this post is the output of a receipt-grade audit: same scanner SHA, same rubric SHA, same store state produces the same score. The 1,741-store sample frame was lock-listed pre-scan in sample-phase2.json on 2026-05-10; the scan ran under run-id phase2-2026-05-11. Named-brand citations — Hyperice, Gymshark, Allbirds, Casper, Olipop — are reproducible against the same store states by anyone running the public scanner. Full methodology is documented in Receipt-grade audit methodology — the same chain that backs every claim on this page.
Three structural fixes. ~10–15 points of lift each.
Fix 1 — add compatibility metadata. Best-for / Compatible-with / Works-with properties on every product. Three structured values per SKU radically lifts compatibility scoring. Shopify metafields with JSON-LD emission is the cleanest implementation.
Fix 2 — convert prose facts into attributes. Audit each product description for facts that should live in structured fields — color, material, weight, fit, occasion, dimensions. Move the top 5 prose facts per SKU into Shopify metafields. JSON-LD emits the result for agent ingestion.
Fix 3 — build a substitute graph. For every product, list 1–3 structured alternatives. Shopify's related-products API is a start but typically lacks the JSON-LD wrapper. Add an agent-readable substitute list so the agent has a graceful fallback when constraints don't match.
The CogniPaws case study documents the exact fix path from 37/100 to 100/100 in three weeks, closing all three universal-failure checks. No engineering team. No re-platforming. Stock Dawn theme. Three SKUs. The fixes generalize because the gaps are structural, not category-specific. UCPScore self-applied the same rubric to its own surfaces and documents the result on the self-applied rubric page — practice-what-we-preach in action.

