METHODOLOGY
A six-dimension framework for measuring how ready a business is for AI agent commerce. 54 sub-checks, geometric aggregation, published methodology. Built on OECD composite index standards, validated across three waves of research and 150+ sources.
Version 1.0 · Published 16 February 2026
Dimensions
6 (Discovery & Access, Structured Data, Commerce Data, Protocol Support, Security & Trust, Technical Performance)
Sub-checks
54
Aggregation
Weighted geometric mean
Scale
1–100, five bands from Not Ready to Agent-Ready
Data source
Publicly observable signals only
Methodology version
1.0
API access
/docs/api · OpenAPI spec at /openapi.json
Scan now
/scan — free, instant results
AI agents are becoming a primary channel for product discovery, evaluation, and purchasing. Not in theory. In production, now.
Google’s Shopping Graph processes over 50 billion product listings with 2 billion hourly refreshes, feeding AI Mode and Business Agents already live with retailers including Lowe’s, Reebok, and Poshmark. OpenAI’s ChatGPT offers Instant Checkout via the Agentic Commerce Protocol, live with Etsy sellers and rolling out to over a million Shopify merchants. Amazon’s Buy for Me agent purchases products from third-party sites on behalf of consumers, selecting merchants based on structured data quality, pricing accuracy, and fulfilment reliability. Visa, Mastercard, and PayPal have each launched agent payment protocols in the past six months.
McKinsey estimates generative AI could unlock $2.6–4.4 trillion annually across use cases (McKinsey, 2023). Morgan Stanley estimates approximately 25% of consumer spending will flow through AI agents by the end of the decade. Deloitte’s 2026 Retail Industry Outlook found 68% of retailers deploying agentic AI within 24 months. The first academic model of AI agent commerce behaviour, the ACES simulator from Columbia University (December 2025), has already documented position biases, choice homogeneity, and model update instability in how agents select merchants.
The question is no longer whether agents will transact on behalf of consumers. It is whether they can transact with you.
Most businesses have no way to answer that. No established standard exists for measuring whether a commerce site is machine-readable, protocol-compatible, and transactionally accessible to AI agents. Existing tools tend to evaluate a single dimension: AI visibility, SEO readiness, or one specific protocol. But agent commerce requires capability across multiple independent dimensions at once. A merchant might have excellent structured data but no protocol support, or implement protocols while blocking the agents that would use them.
The Zeodyn Score™ measures readiness across six dimensions. Scan your site at /scan and get results in seconds.
The Agent Commerce Stack™ evaluates six independent dimensions simultaneously and aggregates them using geometric mean, a method that penalises imbalance rather than allowing strength in one area to mask critical weakness in another. The result is a composite score that reflects genuine, balanced readiness rather than partial compliance.
Zeodyn is the only multi-dimensional agent commerce assessment with a published, OECD-aligned methodology, geometric aggregation, and fail gates for critical capabilities.
Most existing tools evaluate a single signal or protocol. The Agent Commerce Stack™ evaluates 54 signals across six dimensions and publishes the framework openly.
We do this because the biggest criticism of rating systems is opacity. Berg et al. (“Aggregate Confusion: The Divergence of ESG Ratings”, 2022) found that ESG ratings from different providers correlate at only about 60%, compared to roughly 90% for credit ratings, driven primarily by differences in scope, weighting, and measurement. The common thread: methodological opacity. A if you cannot scrutinise a methodology, you cannot trust or improve it. We publish scope (six dimensions, 54 sub-checks) and aggregation method (weighted geometric mean) while keeping calibration parameters (exact weights, scoring curve parameters) confidential. That balance is consistent with established practice at Google Lighthouse, MSCI, S&P, and other index providers.
The score is also gaming-resistant. Every sub-check maps to a capability AI agents actually need. There is no way to improve your score without improving your actual readiness. That matters because of a documented problem in rating systems: Goodhart’s Law, formalised in the Manheim and Garrabrant taxonomy, holds that when a measure becomes a target it ceases to be a good measure. The exception is when improvement on the measure and improvement on the underlying construct are the same thing. That is the case here. Improving your Zeodyn Score™ is becoming more ready.
The framework is built on global standards: schema.org, GS1 identifiers operating across 245 countries, platform-agnostic commerce protocols (UCP, ACP, MCP), and universal security specifications (TLS, HSTS, CSP). Scores are directly comparable across markets and geographies.
Merchants and retailers scan to understand how their site appears to AI agents and what to fix first. The score breaks down into six dimensions and 54 individual sub-checks, each with specific, actionable recommendations ranked by effort and impact. Scan your site.
Agencies and consultants use the framework to assess client sites, benchmark against competitors, and add agent commerce optimisation to their service offering. Batch scanning, competitive benchmarking, and team workspaces support agency workflows at scale.
Platform providers (e-commerce platforms, payment processors, technology partners) use aggregate score data to understand how their merchants perform and identify where platform-level improvements would have the greatest impact.
Researchers and analysts can reference the published methodology, cite the framework with attribution, and access scores programmatically via the Zeodyn API.
The Agent Commerce Stack™ measures six dimensions, mapped to the agent commerce pipeline: the sequence an AI agent follows when attempting to find, evaluate, trust, and purchase from a business.
An agent must first discover a business and gain access to its pages. Then understand products through structured, machine-readable data. Then trust the operational commerce data (prices, availability, shipping, returns) enough to make transactional decisions on behalf of a consumer. Then transact via programmatic protocols rather than navigating a human checkout flow. Then verify the merchant’s legitimacy and security posture. And finally parse pages efficiently enough to extract data at scale.
Six pipeline stages. Six dimensions. Each independently scored and actionable.
The six dimensions emerged from three waves of research across 23 investigative angles and 150+ sources (detailed in the Research Foundation section below). During that process, several additional dimensions were formally evaluated and rejected.
Agent Experience was considered as a seventh dimension, an analogue to user experience for AI agents. It was rejected because agent experience is an emergent outcome of all six dimensions combined, not an independent measurable construct. Including it would violate the OECD requirement for dimension independence and introduce double-counting in the geometric mean.
Content Quality was considered as a standalone dimension. It was rejected because the signals that matter for agent comprehension are already distributed across Structured Data (schema completeness), Technical Performance (semantic HTML, heading hierarchy), and Discovery & Access (crawlability and rendering). Separating them would fragment related signals without adding explanatory power.
Regulatory Compliance was considered, covering GDPR consent mechanisms, EU AI Act obligations, and PCA/SCA payment requirements. The EU AI Act (2024) regulates AI system operators, not the merchant sites those systems interact with. The UK’s Data Use and Access Act takes a pro-innovation approach, with the ICO monitoring agentic AI developments in 2026. The externally scannable compliance signals that matter (privacy policy, cookie consent, terms of service) are already captured in Security & Trust. Internal compliance posture cannot be assessed from a public scan.
Industry-Specific Scoring (separate profiles for retail, travel, B2B, services) was deferred to v2. The six-dimension framework is universal: every commerce site needs discoverability, structured data, commerce data, protocols, security, and performance regardless of vertical. Industry profiles would adjust emphasis, not structure.
For a more accurate assessment, paid tiers scan multiple pages from your site. Site-level infrastructure — how discoverable you are, which protocols you support, and how secure your site is — is measured from your homepage, where these signals live. Commerce capability — your structured data quality and transaction readiness — is measured from your actual product pages, where products, prices, and availability naturally reside. Technical performance is assessed across all pages scanned, giving a more reliable picture of your server’s behaviour.
This approach ensures each dimension of the Agent Commerce Stack™ is measured from the page type where that dimension’s data most naturally exists.
The scanner runs 54 sub-checks across six dimensions. Exact weights are proprietary. Each dimension below shows its relative importance using a qualitative label (Very High, High, or Moderate), consistent with the disclosure practice used by Google Lighthouse and MSCI.
“Can AI agents find and access your commerce capabilities?”
Discovery is the entry point of the pipeline. If agents cannot find you, nothing else matters. AI agent traffic grew 1,300% in nine months according to HUMAN Security’s 2025 analysis, and how businesses manage that traffic has become a significant commercial decision: block it and you lose a sales channel; leave it unmanaged and you risk cost and security exposure.
What we check
robots.txt AI agent policy (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, OAI-SearchBot, Anthropic-AI: allow, block, or selective access)llms.txt and agent discovery file presenceFail gate: robots.txt blocks all known AI agents → dimension score capped. A site that actively prevents agent access cannot score well on discoverability.
“Can AI agents understand your products in machine-readable form?”
The most heavily weighted dimension, for a simple reason: agents cannot evaluate, compare, or recommend products they cannot parse.
Google’s Shopping Graph, the data layer behind AI Mode, Business Agents, and Google Shopping, relies entirely on structured product data. Without it, products are invisible to the largest agent commerce ecosystem in existence. The Universal Commerce Protocol (UCP) specifically requires OfferShippingDetails and MerchantReturnPolicy schemas for agent-mediated transactions. Syndigo’s Commerce Readiness 2026 framework identifies structured product data as the foundation of their five-stage maturity model.
What we check
Fail gate: Zero Product or Offer schema detected → dimension score capped. Without structured product data, agents have nothing to work with.
“Can AI agents trust your operational data for transactional decisions?”
Goes beyond structured data presence to evaluate transactional reliability. An agent recommending a product based on stale pricing or inaccurate inventory creates a poor outcome for the consumer and erodes trust in the whole channel.
McKinsey identifies data quality as foundational: clean inventory, predictable fulfilment, and transparent policies determine which merchants become default agent-selected suppliers. Amazon’s Buy for Me agent failure modes, observed since its early 2026 launch, centre on exactly these signals: price inconsistency between page and schema, ambiguous availability status, missing shipping information.
What we check
Fail gate: No price detectable on product pages (neither structured data, microformats, nor visible content) → dimension score capped. Price is the minimum viable data point for any transactional decision.
“Can AI agents programmatically transact with your commerce infrastructure?”
Measures adoption of the agent commerce protocols that enable programmatic discovery, capability negotiation, and checkout. Not screen-scraping a human checkout flow, but purpose-built machine-to-machine protocols.
The protocol ecosystem is developing across three layers:
Commerce protocols handle agent discovery and transaction execution. The Universal Commerce Protocol (UCP), launched by Google and Shopify in January 2026, enables structured capability discovery via /.well-known/ucp manifests with modular declarations (catalogue, checkout, orders, fulfilment). UCP is transport-agnostic, working over REST, MCP, and A2A, and includes a human escalation model for transactions requiring consumer intervention. Endorsers include Adyen, American Express, Best Buy, Flipkart, Macy’s, Mastercard, Stripe, Home Depot, Visa, and Zalando. The Agentic Commerce Protocol (ACP), from OpenAI and Stripe, powers Instant Checkout in ChatGPT, live with Etsy sellers and rolling out to over a million Shopify merchants. PayPal adopted ACP in October 2025. Google’s Agent Payments Protocol (AP2) uses Verifiable Digital Credentials with 60+ partners.
Payment protocols handle authentication and financial delegation. Visa’s Trusted Agent Protocol (TAP) uses three-layer signature verification (agent, consumer, payment credential) built on Web Bot Auth and HTTP Message Signatures (RFC 9421). Mastercard’s Agent Pay, developed with the FIDO Payments Working Group, launched with Fiserv as the first major processor in December 2025. PayPal’s agentic framework includes Agent Ready, Store Sync, and a native MCP server.
Infrastructure protocols provide the communication layer. The Model Context Protocol (MCP), now under the Linux Foundation’s AI & Data Foundation with 10,000+ published servers, standardises how AI models connect to external tools and data. Google’s Agent-to-Agent (A2A) protocol supports multi-agent orchestration via agent cards at /.well-known/agent-card.json. Web Bot Auth (Cloudflare) provides Ed25519 cryptographic agent identity verification, with IETF drafts in progress.
What we check
Fail gate: All AI agents actively blocked with no programmatic alternative → dimension score capped.
On platform ecosystems: The scanner measures what is observable on the domain being scanned. If a platform supports UCP but the merchant hasn’t enabled it, the score reflects the merchant’s current state, not the platform’s potential. Agent commerce readiness is about what an agent encounters today, not what could theoretically be activated. Rewarding platform choice over actual implementation would introduce bias.
“Can AI agents verify your legitimacy and operate safely?”
When an agent commits a consumer’s payment credentials to a transaction, it needs reasonable confidence in the merchant’s legitimacy and security posture. The checks here measure observable signals that provide that confidence, drawn from OWASP API Security Top 10 recommendations and the Web Bot Auth specification’s security requirements. Trust signals align with Google’s E-E-A-T framework as documented in GEO research on how AI systems evaluate source credibility.
What we check
Fail gate: No HTTPS → dimension score capped. TLS encryption is the baseline for any secure transaction.
“Can AI agents parse your pages quickly and efficiently?”
AI agents don’t wait for animations, don’t scroll, and many can’t execute JavaScript at all. A page that works well for humans in a browser may be entirely opaque to an agent parser.
Architecture matters here. The shift toward headless and composable commerce (commercetools, 2026) creates both opportunities and risks: decoupled architectures can serve responses optimised for agent parsing, but pure client-side rendering makes content invisible to crawlers. The framework is architecture-agnostic. It measures outcomes (can agents parse the page?) rather than prescribing technology. Sites on Shopify Hydrogen, Next.js with SSR, static generators, or traditional server-rendered stacks can all score well if agents can read the rendered output. There is also significant overlap between WCAG accessibility standards and agent comprehension: semantic HTML, heading hierarchy, alt text, and logical structure benefit both human assistive technology and AI agent parsers.
What we check
Fail gate: Pure client-side SPA with no server-side rendering → dimension score capped. Most AI agent crawlers do not execute JavaScript.
The six dimension scores are aggregated into a single composite: the Zeodyn Score™, scaled 1–100.
A simple arithmetic average allows full compensability: strength in one area directly offsets weakness in another. A merchant with strong structured data but zero protocol support scores around 50 under an arithmetic mean, misleadingly suggesting “halfway ready” despite being unable to transact with any agent.
Geometric aggregation penalises imbalance. A near-zero score in any dimension pulls the composite toward zero, reflecting how the agent commerce pipeline actually works: a break at any stage stops the transaction.
The UN Human Development Index adopted geometric mean for exactly this reason in 2010. As the UNDP put it: “A poor achievement in one dimension is not linearly compensated by a higher achievement in another dimension.” Munda (2005) provides the theoretical basis, and De Muro et al. (2011) demonstrate that geometric aggregation produces more stable rankings when weights are perturbed, which matters for any index that claims to be credible.
No other agent commerce assessment uses geometric aggregation.
When a critical sub-check fails (no HTTPS, no structured product data, no detectable price, all agents blocked), the affected dimension score is capped regardless of other results within that dimension.
The precedent is NIST Cybersecurity Framework 2.0 (2024), where certain controls are mandatory regardless of maturity elsewhere. It mirrors ISO 27001, where specific clauses cannot be excluded from scope. Credit rating agencies apply the same principle: structural weaknesses cap the rating regardless of financial performance.
Fail gates prevent a common problem in rating systems: accumulating minor positives to mask the absence of something fundamental.
Raw sub-check results are transformed through scoring curves before aggregation. Going from zero to basic implementation counts for more than going from excellent to perfect, reflecting the reality that initial adoption has the highest marginal value. Google Lighthouse takes the same approach, calibrating curves against real-world distributions from the HTTP Archive. v1.0 uses expert-derived curves; v1.1 will calibrate against empirical scan data, the same path Lighthouse followed.
Geometric mean involves multiplication, so any factor of zero produces a composite of zero regardless of everything else. The minimum score is 1 (“no meaningful capability detected”), preserving mathematical integrity. Standard practice: the HDI and COINr composite indicator framework both use this approach.
The Zeodyn Score™ reflects the state of a site at the moment of scanning. Credit rating agencies distinguish between point-in-time (current state) and through-the-cycle (averaged over time). For agent commerce, point-in-time is the right model: an agent visiting your site today encounters it as it is today. Trend tracking via score history and weekly re-scans (trend charts) provides the longitudinal view.
| Range | Band | Meaning |
|---|---|---|
| 90–100 | Agent-Ready | Commerce fully accessible to AI agents. |
| 70–89 | Strong Foundation | Most agent interactions will succeed. |
| 50–69 | Developing | Agents can discover but not fully transact. |
| 25–49 | Limited | Significant gaps block agent commerce. |
| 1–24 | Not Ready | Agents cannot meaningfully interact. |
Publicly observable signals only: the same information available to any AI agent visiting your site. HTTP headers, HTML content, schema markup, robots.txt, /.well-known/ endpoints, SSL certificates, and page performance metrics.
We do not access login-protected content, circumvent access controls, scrape personal data, or store copyrighted content. The legal basis for scanning publicly available web data is established under hiQ Labs v. LinkedIn (9th Circuit, 2022) and Van Buren v. United States (Supreme Court, 2021).
Each scan assesses a single URL. Site-wide infrastructure signals — such as robots.txt policies, security headers, protocol manifests, and server performance — are consistent across all pages and fully captured in every scan. Commerce-specific signals like product schema, pricing data, and availability markup are page-dependent. For the most comprehensive assessment of an e-commerce site, scan both the homepage and a representative product page.
Composite indicators must be interpretable at every level (OECD requirement). The Zeodyn Score™ decomposes fully:
The composite tells you where you stand. Dimensions tell you where to focus. Sub-checks tell you what to fix. Every recommendation is prioritised by effort and impact.
Three potential sources of bias need addressing in any composite assessment like this.
Size. The framework measures capability, not scale. A small merchant on Shopify with complete structured data and platform-enabled protocol support can score higher than a large retailer with a custom-built site that lacks these signals. The bar for “good structured data” is the same whether you have 10 products or 10 million.
Platform. The scanner measures what is observable on the site, not what the underlying platform could support. A Shopify merchant with UCP enabled scores well on Protocol Support; one who hasn’t enabled it does not. Platform choice is not a proxy for readiness. We acknowledge in recommendations that different platforms create different effort curves, but that affects guidance, not scoring.
Geography. Sub-checks are based on global standards (schema.org, GS1, TLS, HTTP specifications) and internationally deployed protocols. No sub-check requires a region-specific certification or payment method. EU GDPR consent signals are assessed under the universal “cookie consent mechanism” check rather than as a Europe-specific requirement.
We apply data quality principles from ISO 8000 and ISO/IEC 25012 to our own measurement instrument:
v1.0 evaluates structural readiness: are the signals, data, and protocols an AI agent needs present and correctly implemented? Think fire safety: the sprinklers are installed, the exits are marked, the alarms are wired.
Behavioural testing is planned for v2: simulating an actual agent attempting to discover, evaluate, and transact. That is the fire drill, testing whether the systems work under real conditions. Structural readiness comes first. Without the right signals in place, there is nothing to behaviourally test.
The Agent Commerce Stack™ was built through three waves of research covering 23 investigative angles and over 150 sources.
The first wave mapped the agent commerce protocol ecosystem as it stood in January–February 2026. Without understanding what protocols exist, how they work, and how they relate to each other, the Protocol Support and Discovery & Access dimensions could not have been designed.
Commerce protocols: Universal Commerce Protocol (UCP, Google/Shopify, January 2026, 20+ endorsers including Visa, Mastercard, Stripe, American Express), Agentic Commerce Protocol (ACP, OpenAI/Stripe, September 2025–present, live in ChatGPT), Agent Payments Protocol (AP2, Google, 60+ partners).
Payment protocols: Visa Trusted Agent Protocol (TAP, Web Bot Auth + RFC 9421), Mastercard Agent Pay (FIDO Payments Working Group, Fiserv adoption December 2025), PayPal agentic framework (Agent Ready, Store Sync, MCP server).
Infrastructure protocols: Model Context Protocol (MCP, AAIF/Linux Foundation, 10,000+ servers), Agent-to-Agent (A2A, Google), Web Bot Auth (Cloudflare, Ed25519, IETF drafts).
Data standards: GS1 Global Trade Item Numbers, GDSN (100 million items, 245 countries), Syndigo Commerce Readiness 2026, Google Merchant Center conversational commerce attributes (January 2026).
A key finding from this wave was that the protocol landscape has three distinct layers (commerce, payment, and infrastructure) and a meaningful assessment must cover all three. Single-protocol tools miss the majority of the picture.
The second wave answered a different question: given that we know what to measure, how do we combine 54 signals into a defensible composite score? This required going deep on index construction methodology, a field with decades of published standards and documented mistakes to learn from.
Core frameworks: OECD/JRC Handbook on Constructing Composite Indicators (2008, the global standard), European Commission Knowledge for Policy toolkit, Greco et al. (2018), Rogge (2017), Springer methodology review (2018).
Aggregation theory: UN Human Development Index (geometric mean, 2010), Munda (2005) on compensability, Färe and Zelenyuk (2003), De Muro et al. (2011) on PCA weight instability. The HDI’s 2010 switch from arithmetic to geometric mean was the single most important precedent for the Agent Commerce Stack™’s aggregation design.
Scoring benchmarks: Google Lighthouse performance scoring (log-normal curves, HTTP Archive calibration), NIST CSF 2.0 (non-compensatory tiers), ISO 27001.
Rating methodology critique: Berg et al. (2022) on ESG divergence, Billio et al. (2021), MSCI ESG Methodology (2024). The central finding, that ESG ratings diverge primarily because of methodological opacity, directly informed the decision to publish the Agent Commerce Stack™ framework openly.
Financial index construction: MSCI, S&P, and FTSE Russell methodology standards for transparency, rebalancing, and change management governance.
Credit rating methodology:Moody’s, S&P, and Fitch approaches to qualitative-quantitative integration, point-in-time versus through-the-cycle assessment, and the non-compensatory treatment of structural weaknesses. Directly produced the point-in-time scoring model and the fail gate mechanism.
The third wave stress-tested the framework from 23 angles, looking for gaps, weaknesses, and missed considerations.
Academic research: ACES simulator (Columbia, December 2025) provided the first empirical evidence of how AI agents actually behave in commerce, documenting position biases and choice homogeneity. GEO papers informed Discovery & Access and Security & Trust signals. AAIF/Linux Foundation frameworks validated interoperability assumptions. Goodhart’s Law and the Manheim-Garrabrant gaming taxonomy shaped the “gaming-resistant by design” principle.
Measurement science: COSMIN framework, AERA/APA/NCME Standards (content, construct, and criterion validity), Classical Test Theory, ISO 8000 data quality series, ISO/IEC 25012, DAMA DMBOK. These informed the validation roadmap and scanner quality targets.
Market validation: Amazon Buy for Me failure modes (validates Commerce Data dimension), Google Shopping Graph signals (validates Structured Data weighting), Deloitte 2026 Retail Outlook (market sizing), OWASP API Security Top 10 (validates Security & Trust), WCAG/agent comprehension overlap (validates Technical Performance), headless commerce implications (validates architecture-agnostic design), content freshness patterns (validates data recency sub-checks).
Regulatory review: EU AI Act, UK Data Use and Access Act, ICO agentic AI monitoring. Conclusion: no new dimension needed; scannable compliance signals already captured.
Edge cases: Marketplace sellers, B2B sites, subscription services, digital products, multi-language sites. Conclusion: the framework is universal; industry-specific profiles are a v2 enhancement.
Additional frameworks: Forrester DX Index, Gartner Digital Commerce Maturity Model, IDC MaturityScape, HUMAN Security bot management, information retrieval metrics (NDCG, MAP, precision/recall), ISAE 3000 and SOC 2 Type II assurance standards. These informed target-setting and the future validation and third-party assurance roadmap rather than framework structure.
| Decision | Alternatives considered | Rationale |
|---|---|---|
| Geometric mean | Arithmetic mean, multi-criteria analysis, Mazziotta-Pareto index | Prevents compensatory scoring. HDI 2010 precedent. Munda (2005), De Muro (2011). |
| Six dimensions | Five (merge Security/Performance), seven (+Agent Experience), eight (+Content Quality, +Regulatory) | Maps to agent pipeline. Rejected dimensions are emergent or already distributed. |
| Fail gates | Continuous scoring only, maturity tiers, penalty functions | Captures prerequisite relationships. NIST CSF 2.0, ISO 27001, credit rating precedent. |
| Expert weights v1.0 | Equal weights, PCA-derived, stakeholder survey, budget allocation | Standard launch practice (Lighthouse, MSCI, HDI). Empirical calibration planned v1.1. |
| 1–100 scale | 0–100, letter grades, percentile ranks | Geometric mean requires non-zero. HDI, COINr precedent. |
| Point-in-time | Through-the-cycle, rolling window | Matches agent reality. Credit rating precedent. |
| Public methodology | Fully proprietary, fully open including weights | Addresses ESG opacity. Builds trust. Establishes prior art. |
| Platform-neutral | Platform-aware boosts | Measures implementation over potential. Avoids bias. |
| Gaming-resistant | Anti-gaming penalties, obfuscation | Per Goodhart: align measure with construct. More durable. |
The OECD Handbook defines a ten-step process for constructing defensible composite indicators:
| Step | Requirement | Status |
|---|---|---|
| 1. Theoretical framework | Define the concept | ✓ Agent commerce pipeline, six dimensions |
| 2. Data selection | Choose indicators | ✓ 54 sub-checks, publicly observable |
| 3. Imputation | Handle missing data | ✓ N/A: scanner detects or records absence |
| 4. Multivariate analysis | Test statistical structure | Planned v1.1: PCA on real distributions |
| 5. Normalisation | Scale consistently | ✓ Scoring curves, diminishing returns |
| 6. Weighting | Assign importance | ✓ Expert v1.0, empirical v1.1 |
| 7. Aggregation | Combine into composite | ✓ Weighted geometric mean |
| 8. Robustness & sensitivity | Test stability | Planned v1.1: ±5% weight perturbation |
| 9. Back to real data | Validate against outcomes | Planned v1.1: Agent transaction correlation |
| 10. Presentation | Communicate effectively | ✓ Radar chart, decomposed scores, recommendations |
Three forms of validity, per AERA/APA/NCME measurement standards:
Content validity asks whether the 54 sub-checks cover all essential aspects of the domain. Established through three research waves, 23 angles, 150+ sources, with documentation of what was included, excluded, and why.
Construct validity asks whether the six dimensions represent the right underlying constructs. The agent commerce pipeline provides the theoretical basis. PCA on real distributions (v1.1) will test statistical independence across dimensions.
Criterion validity asks whether a higher score predicts better outcomes. A 90-day study correlating Zeodyn Score™ with agent transaction success rates (planned post-100 scans, target: r > 0.50) will answer this. It is the strongest test and the one that matters most.
Expert-derived weights for v1.0, following a four-pillar justification:
Google Lighthouse started with expert weights and evolved to empirical calibration. The HDI did the same over three decades. MSCI reviews its expert-committee weights annually. The progression from expert to empirically calibrated is a sign of methodological maturity, not a weakness. v1.1 will derive empirical weights from real scan distributions.
Post-100 scans:
Methodology updates follow financial index provider practice:
Fully automated. No editorial judgment or commercial influence on individual scores. Scores cannot be purchased, sponsored, or manually adjusted.
AI agents discovering, evaluating, comparing, and purchasing products on behalf of consumers, programmatically. Google AI Mode, ChatGPT Instant Checkout, and Amazon Buy for Me are live examples.
A composite metric from 1 to 100 measuring AI agent commerce readiness across six dimensions, aggregated using a weighted geometric mean. Scan your site at /scan.
54 sub-checks → fail gate evaluation → dimension scoring → scoring curve transformation → weighted geometric mean → Zeodyn Score™. The full pipeline is described above.
Arithmetic averages let strength in one area mask weakness in another. Geometric aggregation penalises imbalance, reflecting the reality that a break at any pipeline stage blocks the transaction. Same approach as the UN Human Development Index.
Caps on dimension scores when a prerequisite capability is missing. No HTTPS, no product schema, no detectable prices, all agents blocked. Other signals cannot compensate for them.
Relative importance is disclosed (Very High, High, Moderate). Exact weights are proprietary, consistent with Lighthouse, MSCI, and S&P practice.
No. It measures capability, not scale. A small Shopify merchant with complete structured data can outscore a large custom-built retailer that lacks it.
The scanner measures what’s observable on your site, not what your platform could theoretically do. A Shopify store with UCP enabled scores well. One without it doesn’t.
After any significant site change. Free accounts get 10 scans/day. Pro and Growth include automated weekly re-scans with trend tracking. See pricing at /pricing.
v1.0 weights are validated through three research waves across 150+ sources. Empirical validation (predictive validity, test-retest, sensitivity analysis) is planned for v1.1 post-100 scans. The roadmap is published above.
Yes. Full programmatic access to scanning, results, watched sites, webhooks, and batch processing. API documentation at /docs/api. OpenAPI spec at /openapi.json.
Every scan includes prioritised recommendations per dimension, ranked by effort and impact. Common high-impact fixes: add Product/Offer JSON-LD (Structured Data), include prices in schema (Commerce Data), allow AI agents in robots.txt (Discovery & Access), implement SSR (Technical Performance). Scan your site at /scan.
/.well-known/ucp.The Agent Commerce Stack™ framework, the Zeodyn Score™ metric, and all associated methodology, documentation, scoring algorithms, weightings, and software are the exclusive intellectual property of Virtual Factory Solutions Ltd., registered in England and Wales. All rights reserved worldwide.
Zeodyn™, Zeodyn Score™, and Agent Commerce Stack™ are trademarks of Virtual Factory Solutions Ltd., in use since February 2026. Usage governed by our Trademark Usage Guidelines.
All content on this page is protected by copyright under the Berne Convention and applicable national laws. © 2026 Virtual Factory Solutions Ltd. All rights reserved worldwide.
This methodology was first published at zeodyn.com/methodology on 16 February 2026, establishing date and authorship of the framework, its six-dimension structure, geometric aggregation, fail gates, scoring pipeline, and governance model as published prior art under international intellectual property law.
The framework, dimensions, sub-checks, aggregation method, and governance processes are disclosed openly. Proprietary elements (exact dimension weights, sub-check weights, scoring curve parameters, calibration data, scanner implementation) remain confidential trade secrets of Virtual Factory Solutions Ltd. and may not be reproduced, reverse-engineered, derived, or disclosed without prior written consent.
Scan results, score data, and aggregated datasets constitute a database protected under the Copyright and Rights in Databases Regulations 1997 (EU Directive 96/9/EC) and equivalent international protections.
Researchers, analysts, journalists, and other third parties are welcome to cite this methodology with attribution to Virtual Factory Solutions Ltd. Contact ip@zeodyn.com for licensing, permissions, or IP enquiries.
The Zeodyn Score™ is a technical diagnostic based on objectively verifiable signals. Re-scan at any time. Contact methodology@zeodyn.com with specific signal disputes.
Scores are point-in-time. They change as your site changes and the ecosystem evolves.
Scan your site and get your Zeodyn Score™ in seconds. Full dimension breakdown, sub-check detail, and prioritised recommendations.
Scan your site →Free. No account required. For historical tracking, automated re-scans, API access, batch scanning, and team features, see pricing.
© 2026 Virtual Factory Solutions Ltd. All rights reserved worldwide.
Agent Commerce Stack™ and Zeodyn Score™ are trademarks of Virtual Factory Solutions Ltd. First published 16 February 2026. Published prior art. Methodology version 1.0.