Skip to content

METHODOLOGY

Agent Commerce Stack™

A six-dimension framework for measuring how ready a business is for AI agent commerce. 54 sub-checks, geometric aggregation, published methodology. Built on OECD composite index standards, empirically validated against 3,598 sites across 17 global markets (as of March 2026).

Version 1.0 · Published 16 February 2026

At a glance

Dimensions

6 (Discovery & Access, Structured Data, Commerce Data, Protocol Support, Security & Trust, Technical Performance)

Sub-checks

54

Aggregation

Weighted geometric mean

Scale

1–100, five bands from Not Ready to Agent-Ready

Data source

Publicly observable signals only

Methodology version

1.0 (scoring engine v1.0.1)

API access

/docs/api · OpenAPI spec at /openapi.json

Scan now

/scan — free, instant results

The landscape is shifting

AI agents are becoming a primary channel for product discovery, evaluation, and purchasing. Not in theory. In production, now.

Google’s Shopping Graph processes over 50 billion product listings with 2 billion hourly refreshes, feeding AI Mode and Business Agents already live with retailers including Lowe’s, Reebok, and Poshmark. OpenAI’s ChatGPT offers Instant Checkout via the Agentic Commerce Protocol, live with Etsy sellers and rolling out to over a million Shopify merchants. Amazon’s Buy for Me agent purchases products from third-party sites on behalf of consumers, selecting merchants based on structured data quality, pricing accuracy, and fulfilment reliability. Visa, Mastercard, and PayPal have each launched agent payment protocols in the past six months.

McKinsey estimates generative AI could unlock $2.6–4.4 trillion annually across use cases (McKinsey, 2023). Morgan Stanley estimates approximately 25% of consumer spending will flow through AI agents by the end of the decade. Deloitte’s 2026 Retail Industry Outlook found 68% of retailers deploying agentic AI within 24 months. The first academic model of AI agent commerce behaviour, the ACES simulator from Columbia University (December 2025), has already documented position biases, choice homogeneity, and model update instability in how agents select merchants.

The question is no longer whether agents will transact on behalf of consumers. It is whether they can transact with you.

Most businesses have no way to answer that. No established standard exists for measuring whether a commerce site is machine-readable, protocol-compatible, and transactionally accessible to AI agents. Existing tools tend to evaluate a single dimension: AI visibility, SEO readiness, or one specific protocol. But agent commerce requires capability across multiple independent dimensions at once. A merchant might have excellent structured data but no protocol support, or implement protocols while blocking the agents that would use them.

The Zeodyn Score™ measures readiness across six dimensions. Scan your site at /scan and get results in seconds.

A different approach

The Agent Commerce Stack™ evaluates six independent dimensions simultaneously and aggregates them using geometric mean, a method that penalises imbalance rather than allowing strength in one area to mask critical weakness in another. The result is a composite score that reflects genuine, balanced readiness rather than partial compliance.

Zeodyn is the only multi-dimensional agent commerce assessment with a published, OECD-aligned methodology, geometric aggregation, and fail gates for critical capabilities.

Most existing tools evaluate a single signal or protocol. The Agent Commerce Stack™ evaluates 54 signals across six dimensions and publishes the framework openly.

We do this because the biggest criticism of rating systems is opacity. Berg et al. (“Aggregate Confusion: The Divergence of ESG Ratings”, 2022) found that ESG ratings from different providers correlate at only about 60%, compared to roughly 90% for credit ratings, driven primarily by differences in scope, weighting, and measurement. The common thread: methodological opacity. If you cannot scrutinise a methodology, you cannot trust or improve it. We publish scope (six dimensions, 54 sub-checks) and aggregation method (weighted geometric mean) while keeping calibration parameters (exact weights, scoring curve parameters) confidential. That balance is consistent with established practice at Google Lighthouse, MSCI, S&P, and other index providers.

The score is also gaming-resistant. Every sub-check maps to a capability AI agents actually need. There is no way to improve your score without improving your actual readiness. That matters because of a documented problem in rating systems: Goodhart’s Law, formalised in the Manheim and Garrabrant taxonomy, holds that when a measure becomes a target it ceases to be a good measure. The exception is when improvement on the measure and improvement on the underlying construct are the same thing. That is the case here. Improving your Zeodyn Score™ is becoming more ready.

The framework is built on global standards: schema.org, GS1 identifiers operating across 245 countries, platform-agnostic commerce protocols (UCP, ACP, MCP), and universal security specifications (TLS, HSTS, CSP). Scores are directly comparable across markets and geographies.

Who is this for

Merchants and retailers scan to understand how their site appears to AI agents and what to fix first. The score breaks down into six dimensions and 54 individual sub-checks, each with specific, actionable recommendations ranked by effort and impact. Scan your site.

Agencies and consultants use the framework to assess client sites, benchmark against competitors, and add agent commerce optimisation to their service offering. Batch scanning, competitive benchmarking, and team workspaces support agency workflows at scale.

Platform providers (e-commerce platforms, payment processors, technology partners) use aggregate score data to understand how their merchants perform and identify where platform-level improvements would have the greatest impact.

Researchers and analysts can reference the published methodology, cite the framework with attribution, and access scores programmatically via the Zeodyn API.

The framework

The Agent Commerce Stack™ measures six dimensions, mapped to the agent commerce pipeline: the sequence an AI agent follows when attempting to find, evaluate, trust, and purchase from a business.

An agent must first discover a business and gain access to its pages. Then understand products through structured, machine-readable data. Then trust the operational commerce data (prices, availability, shipping, returns) enough to make transactional decisions on behalf of a consumer. Then transact via programmatic protocols rather than navigating a human checkout flow. Then verifythe merchant’s legitimacy and security posture. And finally parse pages efficiently enough to extract data at scale.

Six pipeline stages. Six dimensions. Each independently scored and actionable.

Why six

The six dimensions emerged from three waves of research across 23 investigative angles and 150+ sources (detailed in the Research Foundation section below). During that process, several additional dimensions were formally evaluated and rejected.

Agent Experience was considered as a seventh dimension, an analogue to user experience for AI agents. It was rejected because agent experience is an emergent outcome of all six dimensions combined, not an independent measurable construct. Including it would violate the OECD requirement for dimension independence and introduce double-counting in the geometric mean.

Content Qualitywas considered as a standalone dimension. It was rejected because the signals that matter for agent comprehension are already distributed across Structured Data (schema completeness), Technical Performance (semantic HTML, heading hierarchy), and Discovery & Access (crawlability and rendering). Separating them would fragment related signals without adding explanatory power.

Regulatory Compliancewas considered, covering GDPR consent mechanisms, EU AI Act obligations, and PCA/SCA payment requirements. The EU AI Act (2024) regulates AI system operators, not the merchant sites those systems interact with. The UK’s Data Use and Access Act takes a pro-innovation approach, with the ICO monitoring agentic AI developments in 2026. The externally scannable compliance signals that matter (privacy policy, cookie consent, terms of service) are already captured in Security & Trust. Internal compliance posture cannot be assessed from a public scan.

Industry-Specific Scoring (separate profiles for retail, travel, B2B, services) was deferred to v2. The six-dimension framework is universal: every commerce site needs discoverability, structured data, commerce data, protocols, security, and performance regardless of vertical. Industry profiles would adjust emphasis, not structure.

Multi-page scanning (Pro & Growth)

For a more accurate assessment, paid tiers scan multiple pages from your site. Site-level infrastructure — how discoverable you are, which protocols you support, and how secure your site is — is measured from your homepage, where these signals live. Commerce capability — your structured data quality and transaction readiness — is measured from your actual product pages, where products, prices, and availability naturally reside. Technical performance is assessed across all pages scanned, giving a more reliable picture of your server’s behaviour.

This approach ensures each dimension of the Agent Commerce Stack™ is measured from the page type where that dimension’s data most naturally exists.

The six dimensions

The scanner runs 54 sub-checks across six dimensions. Exact weights are proprietary. Each dimension below shows its relative importance using a qualitative label (Very High, High, or Moderate), consistent with the disclosure practice used by Google Lighthouse and MSCI.

1.Discovery & Access

High9 checks

Can AI agents find and access your commerce capabilities?

Discovery is the entry point of the pipeline. If agents cannot find you, nothing else matters. AI agent traffic grew 1,300% in nine months according to HUMAN Security’s 2025 analysis, and how businesses manage that traffic has become a significant commercial decision: block it and you lose a sales channel; leave it unmanaged and you risk cost and security exposure.

What we check

  • robots.txt AI agent policy (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, Anthropic-AI: allow, block, or selective access)
  • Sitemap.xml presence, validity, and lastmod recency
  • llms.txt / llms-full.txt presence for LLM-optimised content discovery
  • Server-side rendering detection (most AI crawlers cannot execute JavaScript; pure client-side applications are invisible to the majority of agents)
  • Agent commerce opt-in signals (platform-level agent policies, checkout integration declarations)
  • Hreflang tags for multi-language and multi-region sites
  • Canonical URL structure and crawlability
  • Heading hierarchy quality (H1 presence, heading nesting, structural clarity for agent parsers)
  • Viewport meta tag (responsive content signals for multi-device agent access)

Fail gate: robots.txtblocks all known AI agents → dimension score capped. A site that actively prevents agent access cannot score well on discoverability.

2.Structured Data

Very High10 checks

Can AI agents understand your products in machine-readable form?

The most heavily weighted dimension, for a simple reason: agents cannot evaluate, compare, or recommend products they cannot parse.

Google’s Shopping Graph, the data layer behind AI Mode, Business Agents, and Google Shopping, relies entirely on structured product data. Without it, products are invisible to the largest agent commerce ecosystem in existence. The Universal Commerce Protocol (UCP) specifically requires OfferShippingDetails and MerchantReturnPolicy schemas for agent-mediated transactions. Syndigo’s Commerce Readiness 2026 framework identifies structured product data as the foundation of their five-stage maturity model.

What we check

  • JSON-LD schema presence and validation (the format preferred by Google and expected by all major agent platforms)
  • Product schema completeness (name, description, price, availability, images, brand, SKU, GTIN: the minimum viable product record for agent evaluation)
  • Offer schema (price specification, currency, availability status within Product offers)
  • BreadcrumbList schema (navigation hierarchy for agent crawl context)
  • Organization schema (business identity, contact information, address, social profiles)
  • Open Graph metadata (og:title, og:description, og:image, og:url, og:type for social and agent previews)
  • Twitter Card metadata (card type, title, image, description)
  • GS1/GTIN product identifiers (the global standard for unambiguous product identification across 245 countries)
  • Semantic HTML elements (article, section, nav, header, footer, aside, figure, time, address, main)
  • Image alt text quality (descriptive alt attributes for multimodal agents processing product imagery)

Fail gate: Zero Product or Offer schema detected → dimension score capped. Without structured product data, agents have nothing to work with.

3.Commerce Data

High10 checks

Can AI agents trust your operational data for transactional decisions?

Goes beyond structured data presence to evaluate transactional reliability. An agent recommending a product based on stale pricing or inaccurate inventory creates a poor outcome for the consumer and erodes trust in the whole channel.

McKinsey identifies data quality as foundational: clean inventory, predictable fulfilment, and transparent policies determine which merchants become default agent-selected suppliers. Amazon’s Buy for Me agent failure modes, observed since its early 2026 launch, centre on exactly these signals: price inconsistency between page and schema, ambiguous availability status, missing shipping information.

What we check

  • Price data presence and consistency (visible prices match structured data prices across sources)
  • Currency formatting (valid ISO 4217 currency codes in schema)
  • Availability and inventory status indicators (InStock, OutOfStock, PreOrder in structured data)
  • Shipping information accessibility (cost, methods, delivery estimates; OfferShippingDetails schema)
  • Returns policy machine-readability (MerchantReturnPolicy schema; conditions, timeframes, costs)
  • Payment methods declared (accepted payment methods in structured data)
  • Multi-variant product handling (size, colour, options in structured form)
  • Data freshness signals (dateModified in schema, sitemap lastmod recency)
  • Checkout flow detection (checkout signals, ACP integration, payment delegation indicators)
  • Cart detection (add-to-cart functionality, cart navigation, purchase flow entry points)

Fail gate: No price detectable on product pages (neither structured data, microformats, nor visible content) → dimension score capped. Price is the minimum viable data point for any transactional decision.

4.Protocol Support

High7 checks

Can AI agents programmatically transact with your commerce infrastructure?

Measures adoption of the agent commerce protocols that enable programmatic discovery, capability negotiation, and checkout. Not screen-scraping a human checkout flow, but purpose-built machine-to-machine protocols.

The protocol ecosystem is developing across three layers:

Commerce protocols handle agent discovery and transaction execution. The Universal Commerce Protocol (UCP), launched by Google and Shopify in January 2026, enables structured capability discovery via /.well-known/ucpmanifests with modular declarations (catalogue, checkout, orders, fulfilment). UCP is transport-agnostic, working over REST, MCP, and A2A, and includes a human escalation model for transactions requiring consumer intervention. Endorsers include Adyen, American Express, Best Buy, Flipkart, Macy’s, Mastercard, Stripe, Home Depot, Visa, and Zalando. The Agentic Commerce Protocol (ACP), from OpenAI and Stripe, powers Instant Checkout in ChatGPT, live with Etsy sellers and rolling out to over a million Shopify merchants. PayPal adopted ACP in October 2025. Google’s Agent Payments Protocol (AP2) uses Verifiable Digital Credentials with 60+ partners.

Payment protocolshandle authentication and financial delegation. Visa’s Trusted Agent Protocol (TAP)uses three-layer signature verification (agent, consumer, payment credential) built on Web Bot Auth and HTTP Message Signatures (RFC 9421). Mastercard’s Agent Pay, developed with the FIDO Payments Working Group, launched with Fiserv as the first major processor in December 2025. PayPal’s agentic framework includes Agent Ready, Store Sync, and a native MCP server.

Infrastructure protocols provide the communication layer. The Model Context Protocol (MCP), now under the Linux Foundation’s AI & Data Foundation with 10,000+ published servers, standardises how AI models connect to external tools and data. Google’s Agent-to-Agent (A2A) protocol supports multi-agent orchestration via agent cards at /.well-known/agent-card.json. Web Bot Auth (Cloudflare) provides Ed25519 cryptographic agent identity verification, with IETF drafts in progress.

What we check

  • UCP manifest presence and capability declarations (checkout, catalogue, orders, fulfilment)
  • ACP endpoint detection
  • MCP server exposure
  • Platform-specific agent commerce integrations (Shopify Checkout Kit, WooCommerce, BigCommerce, Magento)
  • API endpoint indicators (OpenAPI specification presence)
  • Checkout flow navigability (multi-step vs single-page, programmatic accessibility)
  • Agent payment capability signals (delegated payment protocol indicators)

Fail gate: All AI agents actively blocked with no programmatic alternative → dimension score capped.

On platform ecosystems:The scanner measures what is observable on the domain being scanned. If a platform supports UCP but the merchant hasn’t enabled it, the score reflects the merchant’s current state, not the platform’s potential. Agent commerce readiness is about what an agent encounters today, not what could theoretically be activated. Rewarding platform choice over actual implementation would introduce bias.

5.Security & Trust

Moderate9 checks

Can AI agents verify your legitimacy and operate safely?

When an agent commits a consumer’s payment credentials to a transaction, it needs reasonable confidence in the merchant’s legitimacy and security posture. The checks here measure observable signals that provide that confidence, drawn from OWASP API Security Top 10 recommendations and the Web Bot Auth specification’s security requirements. Trust signals align with Google’s E-E-A-T framework as documented in GEO research on how AI systems evaluate source credibility.

What we check

  • HTTPS enforcement and TLS configuration (TLS 1.2+, no mixed content)
  • Security headers (HSTS, Content-Security-Policy, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy)
  • CORS headers (relevant for API and cross-origin agent access)
  • Privacy policy presence and accessibility
  • Terms of service presence
  • Cookie consent mechanism
  • Organization schema (business identity signals — physical address, contact details, registration information)
  • Trust signals (third-party verification, industry certifications, review platform presence)
  • Contact information (visible email, phone, physical address — agents verify business legitimacy)

Fail gate: No HTTPS → dimension score capped. TLS encryption is the baseline for any secure transaction.

6.Technical Performance

Moderate9 checks

Can AI agents parse your pages quickly and efficiently?

AI agents don’t wait for animations, don’t scroll, and many can’t execute JavaScript at all. A page that works well for humans in a browser may be entirely opaque to an agent parser.

Architecture matters here. The shift toward headless and composable commerce (commercetools, 2026) creates both opportunities and risks: decoupled architectures can serve responses optimised for agent parsing, but pure client-side rendering makes content invisible to crawlers. The framework is architecture-agnostic. It measures outcomes (can agents parse the page?) rather than prescribing technology. Sites on Shopify Hydrogen, Next.js with SSR, static generators, or traditional server-rendered stacks can all score well if agents can read the rendered output. There is also significant overlap between WCAG accessibility standards and agent comprehension: semantic HTML, heading hierarchy, alt text, and logical structure benefit both human assistive technology and AI agent parsers.

What we check

  • Time to First Byte (TTFB — server response latency)
  • HTML document size and markup efficiency (page weight for agent parsing)
  • Server-side rendering detection (whether meaningful content is available without JavaScript execution)
  • Semantic HTML structure (heading hierarchy, landmark roles, logical document structure)
  • Image alt text coverage (descriptive alt attributes for multimodal agent comprehension)
  • Response compression (gzip/brotli — reduces transfer size for agent parsers)
  • Mobile responsive (viewport meta tag — responsive content signals)
  • Render-blocking resources (scripts and stylesheets that delay content availability)
  • HTTP protocol version (HTTP/2 or HTTP/3 support for efficient multiplexed requests)

Fail gate: Pure client-side SPA with no server-side rendering → dimension score capped. Most AI agent crawlers do not execute JavaScript.

The Zeodyn Score™

The six dimension scores are aggregated into a single composite: the Zeodyn Score™, scaled 1–100.

Geometric aggregation

A simple arithmetic average allows full compensability: strength in one area directly offsets weakness in another. A merchant with strong structured data but zero protocol support scores around 50 under an arithmetic mean, misleadingly suggesting “halfway ready” despite being unable to transact with any agent.

Geometric aggregation penalises imbalance. A near-zero score in any dimension pulls the composite toward zero, reflecting how the agent commerce pipeline actually works: a break at any stage stops the transaction.

The UN Human Development Index adopted geometric mean for exactly this reason in 2010. As the UNDP put it: “A poor achievement in one dimension is not linearly compensated by a higher achievement in another dimension.” Munda (2005) provides the theoretical basis, and De Muro et al. (2011) demonstrate that geometric aggregation produces more stable rankings when weights are perturbed, which matters for any index that claims to be credible.

No other agent commerce assessment uses geometric aggregation.

Fail gates

When a critical sub-check fails (no HTTPS, no structured product data, no detectable price, all agents blocked), the affected dimension score is capped regardless of other results within that dimension.

The precedent is NIST Cybersecurity Framework 2.0 (2024), where certain controls are mandatory regardless of maturity elsewhere. It mirrors ISO 27001, where specific clauses cannot be excluded from scope. Credit rating agencies apply the same principle: structural weaknesses cap the rating regardless of financial performance.

Fail gates prevent a common problem in rating systems: accumulating minor positives to mask the absence of something fundamental.

Scoring curves

Raw sub-check results are transformed through scoring curves before aggregation. Going from zero to basic implementation counts for more than going from excellent to perfect, reflecting the reality that initial adoption has the highest marginal value. Google Lighthouse takes the same approach, calibrating curves against real-world distributions from the HTTP Archive. v1.0 uses expert-derived curves, empirically validated against 3,500+ sites.

Scale: 1–100

Geometric mean involves multiplication, so any factor of zero produces a composite of zero regardless of everything else. The minimum score is 1 (“no meaningful capability detected”), preserving mathematical integrity. Standard practice: the HDI and COINr composite indicator framework both use this approach.

Point-in-time scoring

The Zeodyn Score™ reflects the state of a site at the moment of scanning. Credit rating agencies distinguish between point-in-time (current state) and through-the-cycle (averaged over time). For agent commerce, point-in-time is the right model: an agent visiting your site today encounters it as it is today. Trend tracking via score history and weekly re-scans (trend charts) provides the longitudinal view.

Score bands

RangeBandMeaning
90–100Agent-ReadyCommerce fully accessible to AI agents.
70–89StrongMost agent interactions will succeed.
50–69DevelopingAgents can discover but not fully transact.
25–49LimitedSignificant gaps block agent commerce.
1–24Not ReadyAgents cannot meaningfully interact.

The method

What we scan

Publicly observable signals only: the same information available to any AI agent visiting your site. HTTP headers, HTML content, schema markup, robots.txt, /.well-known/ endpoints, SSL certificates, and page performance metrics.

We do not access login-protected content, circumvent access controls, scrape personal data, or store copyrighted content. The legal basis for scanning publicly available web data is established under hiQ Labs v. LinkedIn (9th Circuit, 2022) and Van Buren v. United States (Supreme Court, 2021).

Free tier scans assess the homepage. Pro and Growth tiers include multi-page scanning, which automatically discovers and scores product pages alongside the homepage. Site-wide infrastructure signals — such as robots.txtpolicies, security headers, protocol manifests, and server performance — are consistent across all pages and fully captured from the homepage. Commerce-specific signals like product schema, pricing data, and availability markup are page-dependent and benefit from multi-page scanning.

Scoring pipeline

1Sub-checks execute54 individual checks producing binary (pass/fail) or continuous (0–100) results
2Scoring curves appliedraw scores transformed to reflect diminishing marginal returns
3Dimension scores calculatedsub-check results aggregated within each dimension
4Fail gates evaluatedcritical failures cap the affected dimension score
5Weighted geometric meansix dimension scores combined into composite
Zeodyn Score™ — final score on 1–100 scale with band classification

How we measure

All scans are conducted from cloud infrastructure located in the United Kingdom (Microsoft Azure, UK South region). The Zeodyn Score™ reflects how a website responds to requests originating from this geographic location.

This means that geographic restrictions, CDN routing, and bot-detection behaviour are captured as they would be experienced by a UK-based AI agent. We consider this a legitimate signal: if an AI agent operating from the UK cannot access a site’s content or structured data, that is a real limitation on agent commerce readiness.

The sub-checks themselves are based on international standards (see Fairness > Geography below). Measurement geography affects how a site responds to our scanner, not what we look for.

Decomposability

Composite indicators must be interpretable at every level (OECD requirement). The Zeodyn Score™ decomposes fully:

  1. Composite— single Zeodyn Score™
  2. Dimensions— six scores, each 1–100
  3. Sub-checks— pass/fail or scored, with specific recommendations
  4. Fail gates— flagged when active, with remediation guidance
  5. Radar chart— visual balance across all dimensions

The composite tells you where you stand. Dimensions tell you where to focus. Sub-checks tell you what to fix. Every recommendation is prioritised by effort and impact.

Fairness

Three potential sources of bias need addressing in any composite assessment like this.

Size.The framework measures capability, not scale. A small merchant on Shopify with complete structured data and platform-enabled protocol support can score higher than a large retailer with a custom-built site that lacks these signals. The bar for “good structured data” is the same whether you have 10 products or 10 million.

Platform.The scanner measures what is observable on the site, not what the underlying platform could support. A Shopify merchant with UCP enabled scores well on Protocol Support; one who hasn’t enabled it does not. Platform choice is not a proxy for readiness. We acknowledge in recommendations that different platforms create different effort curves, but that affects guidance, not scoring.

Geography.Sub-checks are based on global standards (schema.org, GS1, TLS, HTTP specifications) and internationally deployed protocols. No sub-check requires a region-specific certification or payment method. EU GDPR consent signals are assessed under the universal “cookie consent mechanism” check rather than as a Europe-specific requirement.

Scanner quality

We apply data quality principles from ISO 8000 and ISO/IEC 25012 to our own measurement instrument:

  • Accuracy:Planned formal expert audit (target: >95% agreement per 100 scans)
  • Consistency:Planned formal test-retest study (target: r > 0.90, same site rescanned within 24 hours)
  • Completeness: All 54 sub-checks execute for every scan; partial results are flagged
  • Timeliness:Scans reflect the site’s state at scan time; stale cached results are never used
  • Transparency: Every sub-check result is visible in the full results breakdown

Methodology evolution

v1.0.1 (current) — Structural readiness. Evaluates whether the signals, data, and protocols an AI agent needs are present and correctly implemented. Think fire safety: the sprinklers are installed, the exits are marked, the alarms are wired.

v1.1 (on hold) — Empirical calibration.Originally planned to refine dimension weights and scoring curve parameters using empirical distributions. Weight sensitivity analysis (±5%, all ρ > 0.96) and equal-weight comparison (ρ = 0.997) demonstrate that v1.0 rankings are data-driven, not weight-dependent. Empirical recalibration remains a future option but is not a current priority. Any future calibration will follow the change management process below, including 30-day public notice and 6-month transition overlap.

v1.0.1 multi-page scoring (current).Multi-page scanning discovers product pages via sitemap and structured data, then assesses them alongside the homepage. Site infrastructure dimensions (Discovery & Access, Protocol Support, Security & Trust) are sourced from the homepage. Commerce dimensions (Structured Data, Commerce Data) are sourced from discovered product pages, where these signals naturally live. Technical Performance is averaged across all pages scanned. This produces a more accurate assessment because a homepage can appear strong while product pages are blocked or empty. Available on Pro (2 pages) and Growth (up to 5 pages) tiers.

v2.0 (planned) — Behavioural testing. Simulating an actual agent attempting to discover, evaluate, and transact. That is the fire drill, testing whether the systems work under real conditions. Structural readiness comes first. Without the right signals in place, there is nothing to behaviourally test.

Research foundation

The Agent Commerce Stack™ was built through three waves of research covering 23 investigative angles and over 150 sources.

Wave 1: Protocols and standards

The first wave mapped the agent commerce protocol ecosystem as it stood in January–February 2026. Without understanding what protocols exist, how they work, and how they relate to each other, the Protocol Support and Discovery & Access dimensions could not have been designed.

Commerce protocols:Universal Commerce Protocol (UCP, Google/Shopify, January 2026, 20+ endorsers including Visa, Mastercard, Stripe, American Express), Agentic Commerce Protocol (ACP, OpenAI/Stripe, September 2025–present, live in ChatGPT), Agent Payments Protocol (AP2, Google, 60+ partners).

Payment protocols: Visa Trusted Agent Protocol (TAP, Web Bot Auth + RFC 9421), Mastercard Agent Pay (FIDO Payments Working Group, Fiserv adoption December 2025), PayPal agentic framework (Agent Ready, Store Sync, MCP server).

Infrastructure protocols: Model Context Protocol (MCP, AAIF/Linux Foundation, 10,000+ servers), Agent-to-Agent (A2A, Google), Web Bot Auth (Cloudflare, Ed25519, IETF drafts).

Data standards: GS1 Global Trade Item Numbers, GDSN (100 million items, 245 countries), Syndigo Commerce Readiness 2026, Google Merchant Center conversational commerce attributes (January 2026).

A key finding from this wave was that the protocol landscape has three distinct layers (commerce, payment, and infrastructure) and a meaningful assessment must cover all three. Single-protocol tools miss the majority of the picture.

Wave 2: Composite index theory

The second wave answered a different question: given that we know what to measure, how do we combine 54 signals into a defensible composite score? This required going deep on index construction methodology, a field with decades of published standards and documented mistakes to learn from.

Core frameworks: OECD/JRC Handbook on Constructing Composite Indicators (2008, the global standard), European Commission Knowledge for Policy toolkit, Greco et al. (2018), Rogge (2017), Springer methodology review (2018).

Aggregation theory:UN Human Development Index (geometric mean, 2010), Munda (2005) on compensability, Färe and Zelenyuk (2003), De Muro et al. (2011) on PCA weight instability. The HDI’s 2010 switch from arithmetic to geometric mean was the single most important precedent for the Agent Commerce Stack™’s aggregation design.

Scoring benchmarks: Google Lighthouse performance scoring (log-normal curves, HTTP Archive calibration), NIST CSF 2.0 (non-compensatory tiers), ISO 27001.

Rating methodology critique:Berg et al. (2022) on ESG divergence, Billio et al. (2021), MSCI ESG Methodology (2024). The central finding, that ESG ratings diverge primarily because of methodological opacity, directly informed the decision to publish the Agent Commerce Stack™ framework openly.

Financial index construction:MSCI, S&P, and FTSE Russell methodology standards for transparency, rebalancing, and change management governance.

Credit rating methodology:Moody’s, S&P, and Fitch approaches to qualitative-quantitative integration, point-in-time versus through-the-cycle assessment, and the non-compensatory treatment of structural weaknesses. Directly produced the point-in-time scoring model and the fail gate mechanism.

Wave 3: Academic validation

The third wave stress-tested the framework from 23 angles, looking for gaps, weaknesses, and missed considerations.

Academic research:ACES simulator (Columbia, December 2025) provided the first empirical evidence of how AI agents actually behave in commerce, documenting position biases and choice homogeneity. GEO papers informed Discovery & Access and Security & Trust signals. AAIF/Linux Foundation frameworks validated interoperability assumptions. Goodhart’s Law and the Manheim-Garrabrant gaming taxonomy shaped the “gaming-resistant by design” principle.

Measurement science: COSMIN framework, AERA/APA/NCME Standards (content, construct, and criterion validity), Classical Test Theory, ISO 8000 data quality series, ISO/IEC 25012, DAMA DMBOK. These informed the validation roadmap and scanner quality targets.

Market validation:Amazon Buy for Me failure modes (validates Commerce Data dimension), Google Shopping Graph signals (validates Structured Data weighting), Deloitte 2026 Retail Outlook (market sizing), OWASP API Security Top 10 (validates Security & Trust), WCAG/agent comprehension overlap (validates Technical Performance), headless commerce implications (validates architecture-agnostic design), content freshness patterns (validates data recency sub-checks).

Regulatory review: EU AI Act, UK Data Use and Access Act, ICO agentic AI monitoring. Conclusion: no new dimension needed; scannable compliance signals already captured.

Edge cases: Marketplace sellers, B2B sites, subscription services, digital products, multi-language sites. Conclusion: the framework is universal; industry-specific profiles are a v2 enhancement.

Additional frameworks: Forrester DX Index, Gartner Digital Commerce Maturity Model, IDC MaturityScape, HUMAN Security bot management, information retrieval metrics (NDCG, MAP, precision/recall), ISAE 3000 and SOC 2 Type II assurance standards. These informed target-setting and the future validation and third-party assurance roadmap rather than framework structure.

Design decisions

DecisionAlternatives consideredRationale
Geometric meanArithmetic mean, multi-criteria analysis, Mazziotta-Pareto indexPrevents compensatory scoring. HDI 2010 precedent. Munda (2005), De Muro (2011).
Six dimensionsFive (merge Security/Performance), seven (+Agent Experience), eight (+Content Quality, +Regulatory)Maps to agent pipeline. Rejected dimensions are emergent or already distributed.
Fail gatesContinuous scoring only, maturity tiers, penalty functionsCaptures prerequisite relationships. NIST CSF 2.0, ISO 27001, credit rating precedent.
Expert weights v1.0Equal weights, PCA-derived, stakeholder survey, budget allocationStandard launch practice (Lighthouse, MSCI, HDI). Empirically validated: all ±5% perturbations ρ > 0.96.
1–100 scale0–100, letter grades, percentile ranksGeometric mean requires non-zero. HDI, COINr precedent.
Point-in-timeThrough-the-cycle, rolling windowMatches agent reality. Credit rating precedent.
Public methodologyFully proprietary, fully open including weightsAddresses ESG opacity. Builds trust. Establishes prior art.
Platform-neutralPlatform-aware boostsMeasures implementation over potential. Avoids bias.
Gaming-resistantAnti-gaming penalties, obfuscationPer Goodhart: align measure with construct. More durable.

Empirical validation

In March 2026, the Agent Commerce Stack™ v1.0 methodology was empirically validated against 3,598 unique e-commerce sites across 17 global markets and three verticals (fashion, grocery, electronics). Every score was produced by automated scanning of live websites — no self-reported data, no editorial selection.

All scans are conducted from UK-based infrastructure (Azure UK South, London region). Results reflect site behaviour as observed from this location. Sites that geo-restrict access from certain regions may score differently when scanned from local infrastructure. Multi-region scanning is planned as a future infrastructure enhancement.

Formula verification

The weighted geometric mean formula was independently verified against the full corpus. Manual calculations matched stored scores exactly across all test cases, confirming the scoring engine implements the published methodology without error.

Dimensional structure

Principal Component Analysis on 3,598 scan distributions confirms a dominant “general e-commerce quality” factor explaining 56% of variance — sites that invest in one area tend to invest in all areas. This is the same pattern reported for established composite indices including the UN Human Development Index and Google Lighthouse.

Each dimension still measures a theoretically distinct capability. Discovery & Access, Structured Data, Commerce Data, Protocol Support, Security & Trust, and Technical Performance capture genuinely different aspects of agent commerce readiness. No pair of dimensions correlates above 0.85, confirming no two dimensions are redundant.

Weight robustness

Every dimension weight was perturbed by ±5% and site rankings recalculated. Rank-order stability remained above Spearman ρ > 0.96 for all perturbations — exceeding the standard OECD threshold for defensibility.

This means the ranking of sites is driven by the underlying data, not by the specific weight choices. Equal weights (1/6 each) produce near-identical corpus-level rankings (ρ = 0.997), further confirming data-driven scoring.

Geometric aggregation

The non-compensatory geometric aggregation works as designed — sites cannot mask a critical gap in one dimension by scoring well in others. This follows the precedent set by the UN Human Development Index’s 2010 switch from arithmetic to geometric mean, which was adopted precisely to prevent compensatory scoring.

What the data shows

The global average Zeodyn Score™ across 3,598 sites is 21.2 out of 100. 60% of sites scored Not Ready and 40% scored Limited. No site scored Developing, Strong, or Agent-Ready. The highest-scoring site achieved 49 out of 100. These results are based on scanning conducted in March 2026.

Protocol Support averages just 7.1 out of 100 — almost no site has adopted agent commerce protocols. Multi-page scanning assesses product pages alongside your homepage, measuring what AI agents would actually discover through automated navigation.

Criterion validity: what’s next

The final OECD validation step (Step 9: “back to real data”) requires correlating Zeodyn Scores™ with actual AI agent transaction success rates. This is not yet possible because agent commerce is too nascent — very few real agent transactions exist to validate against.

As agent commerce adoption grows and transaction data becomes available, Zeodyn will correlate scores with real-world agent success rates. This is the same path followed by every major composite index: the UN HDI correlated with outcomes over decades, not at launch.

Methodology lineage

  • OECD/JRC Handbookon Constructing Composite Indicators (2008) — the global standard for index methodology
  • UN Human Development Index— geometric mean precedent (adopted 2010)
  • Google Lighthouse— empirical calibration evolution precedent

Rigour and governance

OECD ten-step alignment

The OECD Handbook defines a ten-step process for constructing defensible composite indicators:

StepRequirementStatus
1. Theoretical frameworkDefine the concept Agent commerce pipeline, six dimensions
2. Data selectionChoose indicators 54 sub-checks, publicly observable
3. ImputationHandle missing data N/A: scanner detects or records absence
4. Multivariate analysisTest statistical structure PCA confirms 6 dimensions (3,598 sites)
5. NormalisationScale consistently Sub-check curves validated empirically
6. WeightingAssign importance Expert v1.0, empirically validated (ρ > 0.96)
7. AggregationCombine into composite Weighted geometric mean
8. Robustness & sensitivityTest stability All ±5% perturbations ρ > 0.96
9. Back to real dataValidate against outcomesPlanned: Awaiting agent transaction data at scale
10. PresentationCommunicate effectively Radar chart, decomposed scores, recommendations

Validity

Three forms of validity, per AERA/APA/NCME measurement standards:

Content validity asks whether the 54 sub-checks cover all essential aspects of the domain. Established through three research waves, 23 angles, 150+ sources, with documentation of what was included, excluded, and why.

Construct validity asks whether the six dimensions represent the right underlying constructs. The agent commerce pipeline provides the theoretical basis. PCA on 3,598 real scan distributions confirms that no pair of dimensions is redundant (no pair correlates above 0.85), supporting the six-dimension structure.

Criterion validityasks whether a higher score predicts better outcomes. Correlating Zeodyn Score™ with agent transaction success rates requires real-world transaction data that does not yet exist at scale. As agent commerce matures and transaction data becomes available, this correlation study will be conducted. Target: r > 0.50.

Weighting

Expert-derived weights for v1.0, following a four-pillar justification:

  1. Theoretical rationale— what does the agent pipeline demand?
  2. Empirical evidence— what do platforms actually prioritise? (Google Shopping Graph, Amazon Buy for Me, ChatGPT Instant Checkout)
  3. Failure mode analysis— what happens at zero for each dimension?
  4. Sensitivity analysis— how stable are rankings under ±5% weight shifts? (OECD requirement; confirmed: all perturbations ρ > 0.96)

Google Lighthouse started with expert weights and evolved to empirical calibration. The HDI did the same over three decades. MSCI reviews its expert-committee weights annually. Weight sensitivity analysis confirms v1.0 weights are robust: all ±5% perturbations produce ρ > 0.96, and equal weights yield ρ = 0.997. Empirical recalibration remains a future option as scan data grows.

Validation roadmap

Completed (March 2026, 3,598 sites):

  • Formula verification: Weighted geometric mean calculations match stored scores exactly.
  • PCA (no redundancy): Six dimensions confirmed, no pair correlates above 0.85.
  • Sensitivity analysis:±5% weight perturbation, all Spearman ρ > 0.96.
  • Score distribution: No floor or ceiling clustering observed.

Planned (as agent commerce adoption grows):

  • Predictive validity:Score vs agent transaction success rates. Target: r > 0.50.
  • Test-retest reliability:20 merchants, 7-day intervals, no changes. Target: r > 0.90.
  • Scanner accuracy:Expert audit sample. Target: >95% agreement.

Change management

Methodology updates follow financial index provider practice:

  • Version labelling on all scores (currently v1.0.1)
  • 30-day public notice for material changes, with rationale
  • 6-month transition overlap with both versions available
  • Changelog with rationale and impact assessment
  • Historical scores never retroactively modified

Independence

Fully automated. No editorial judgment or commercial influence on individual scores. Scores cannot be purchased, sponsored, or manually adjusted.

Changelog

v1.0.1 — 9 March 2026

Structured Data fail gate cap adjusted based on empirical validation against 3,598 unique sites. The trigger condition (no Product/Offer schema detected) is unchanged. This refinement improves score differentiation among sites with non-product structured data while maintaining the non-compensatory penalty for missing commerce-relevant schema. Rank correlation with v1.0: ρ = 0.9931.

v1.0 — February 2026

Initial release. Six dimensions, expert-derived weights, geometric mean aggregation, fail gates. Published methodology.

Frequently asked questions

AI agents discovering, evaluating, comparing, and purchasing products on behalf of consumers, programmatically. Google AI Mode, ChatGPT Instant Checkout, and Amazon Buy for Me are live examples.

A composite metric from 1 to 100 measuring AI agent commerce readiness across six dimensions, aggregated using a weighted geometric mean. Scan your site at /scan.

54 sub-checks → scoring curve transformation → dimension scoring → fail gate evaluation → weighted geometric mean → Zeodyn Score™. The full pipeline is described above.

Arithmetic averages let strength in one area mask weakness in another. Geometric aggregation penalises imbalance, reflecting the reality that a break at any pipeline stage blocks the transaction. Same approach as the UN Human Development Index.

Caps on dimension scores when a prerequisite capability is missing. No HTTPS, no product schema, no detectable prices, all agents blocked. Other signals cannot compensate for them.

Relative importance is disclosed (Very High, High, Moderate). Exact weights are proprietary, consistent with Lighthouse, MSCI, and S&P practice.

No. It measures capability, not scale. A small Shopify merchant with complete structured data can outscore a large custom-built retailer that lacks it.

The scanner measures what’s observable on your site, not what your platform could theoretically do. A Shopify store with UCP enabled scores well. One without it doesn’t.

After any significant site change. Free accounts get 10 scans/day. Pro and Growth include automated weekly re-scans with trend tracking. See pricing at /pricing.

Yes. The methodology has been empirically validated against 3,598 unique e-commerce sites across 17 global markets (as of March 2026). Principal component analysis confirms that no pair of dimensions is redundant. Weight sensitivity analysis confirms rank stability under perturbation (Spearman ρ > 0.96 for all ±5% weight variations).

Yes. Full programmatic access to scanning, results, watched sites, webhooks, and batch processing. API documentation at /docs/api. OpenAPI spec at /openapi.json.

Every scan includes prioritised recommendations per dimension, ranked by effort and impact. Common high-impact fixes: add Product/Offer JSON-LD (Structured Data), include prices in schema (Commerce Data), allow AI agents in robots.txt (Discovery & Access), implement SSR (Technical Performance). Scan your site at /scan.

Glossary

Agent Commerce Stack™
The six-dimension framework used by Zeodyn™ to assess AI agent commerce readiness.
Zeodyn Score™
The composite metric (1–100) produced by the Agent Commerce Stack™ methodology.
Geometric mean
An aggregation method where values are multiplied and then root-extracted, rather than summed and averaged. Penalises imbalance more heavily than arithmetic mean.
Fail gate
A scoring cap triggered when a critical capability is absent, preventing other signals from compensating for a fundamental gap.
UCP (Universal Commerce Protocol)
A commerce protocol from Google and Shopify enabling AI agents to discover merchant capabilities and execute transactions via structured manifests at /.well-known/ucp.
ACP (Agentic Commerce Protocol)
A commerce protocol from OpenAI and Stripe enabling agent-initiated checkout, powering Instant Checkout in ChatGPT.
MCP (Model Context Protocol)
An infrastructure protocol, now under the Linux Foundation, standardising how AI models connect to external tools and data sources.
A2A (Agent-to-Agent)
Google’s protocol for multi-agent communication and orchestration.
JSON-LD
JavaScript Object Notation for Linked Data. The structured data format preferred by Google and used across major agent platforms.
Schema.org
A collaborative vocabulary for structured data markup, maintained by Google, Microsoft, Yahoo, and Yandex.
GS1/GTIN
Global Trade Item Numbers, the international standard for unambiguous product identification.

Intellectual property

The Agent Commerce Stack™ framework, the Zeodyn Score™ metric, and all associated methodology, documentation, scoring algorithms, weightings, and software are the exclusive intellectual property of Virtual Factory Solutions Ltd., registered in England and Wales. All rights reserved worldwide.

Zeodyn™, Zeodyn Score™, and Agent Commerce Stack™ are trademarks of Virtual Factory Solutions Ltd., in use since February 2026. Usage governed by our Trademark Usage Guidelines.

All content on this page is protected by copyright under the Berne Convention and applicable national laws. © 2026 Virtual Factory Solutions Ltd. All rights reserved worldwide.

This methodology was first published at zeodyn.com/methodology on 16 February 2026, establishing date and authorship of the framework, its six-dimension structure, geometric aggregation, fail gates, scoring pipeline, and governance model as published prior art under international intellectual property law.

The framework, dimensions, sub-checks, aggregation method, and governance processes are disclosed openly. Proprietary elements (exact dimension weights, sub-check weights, scoring curve parameters, calibration data, scanner implementation) remain confidential trade secrets of Virtual Factory Solutions Ltd. and may not be reproduced, reverse-engineered, derived, or disclosed without prior written consent.

Scan results, score data, and aggregated datasets constitute a database protected under the Copyright and Rights in Databases Regulations 1997 (EU Directive 96/9/EC) and equivalent international protections.

Researchers, analysts, journalists, and other third parties are welcome to cite this methodology with attribution to Virtual Factory Solutions Ltd. Contact Contact us for licensing, permissions, or IP enquiries.

Dispute and re-scan

The Zeodyn Score™ is a technical diagnostic based on objectively verifiable signals. Re-scan at any time. Contact Contact us with specific signal disputes.

Scores are point-in-time. They change as your site changes and the ecosystem evolves.

See your score

Scan your site and get your Zeodyn Score™ in seconds. Full dimension breakdown, sub-check detail, and prioritised recommendations.

Scan your site →

Free. No account required. For historical tracking, automated re-scans, API access, batch scanning, and team features, see pricing.

© 2026 Virtual Factory Solutions Ltd. All rights reserved worldwide.

Agent Commerce Stack™ and Zeodyn Score™ are trademarks of Virtual Factory Solutions Ltd. First published 16 February 2026. Published prior art. Methodology version 1.0.