By ChatGPT, Claude Sonnet 4.6 Extended Thinking, DeepSeek-V3.2, Gemini 3 Thinking, Grok 4.20 Expert, Kiwi K2.5 Thinking with W.H.L.
V4.0 — Third Revision | April 2026
V4.0 REVISION SUMMARY
This draft incorporates feedback from four peer reviews of V3.0 (all recommending accept with minor revisions). Principal revisions include: (1) time-dependent coefficient formulation V(t) = α(t)M + β(t)C + γ(t)E + δ(t)(M·C·E) with explicit trajectory statements, including δ(t) increasing (§3.1); (2) resolution of data-layer ontological ambiguity — D is formalized as a state variable governing execution capacity, E = E(D(t)), unifying the Data Flywheel Proposition with the execution moat (§2.4, §3.1, §8.3); (3) updated central aphorism to reflect resolved data status; (4) correction of legacy numbering in §3.7 and §9.1 (now 1–3 and 1–4 respectively); (5) one concrete quantitative example in §4.1 anchoring the model convergence claim; (6) sharpened proxy metric specifications with materiality thresholds (§8.6); (7) clarification of the three Leak Paradox causal mechanisms (§3.5); (8) distinction between commoditization and concentration at the model layer (§8.1); (9) temporal dynamics of M×C substitution on value migration (§8.4); (10) layer-differentiated regulatory analysis (§8.5); (11) ReAct/HotpotQA citation added for P2 support (§5.2); (12) cleanup of references [25] and [26–30]; (13) sharpened conclusion with a memorable final formulation; and (14) reduction of cross-section redundancy between §2, §3, and §4.
Abstract
Recent advances in large-scale artificial intelligence systems have been widely interpreted through the lens of model capability. However, emerging evidence suggests this model-centric view is increasingly insufficient to explain competitive differentiation in frontier AI systems. This paper introduces the Control Plane Thesis, which posits a structural shift in the locus of value creation from model weights to system-level orchestration and ultimately to organizational execution.
We decompose modern AI systems into three interacting strata: (1) foundation models, which are rapidly commoditizing; (2) control-plane systems, including prompt orchestration, tool integration, and safety pipelines, which act as leverage multipliers; and (3) execution capabilities, encompassing deployment discipline, iteration velocity, and organizational learning, which constitute the primary source of durable competitive advantage. We formalize this through a time-dependent multiplicative value function V(t) = α(t)M + β(t)C + γ(t)E + δ(t)(M·C·E), where the coefficients evolve over time as value migrates across layers. Proprietary interaction data is formalized as a state variable D(t) that accumulates through deployment and governs execution capacity: E = E(D(t)).
We situate this framework within established technology-strategy literature — including Baldwin and Clark’s modularity thesis, Teece’s dynamic capabilities, and platform economics — and argue that AI systems’ probabilistic behavior, output-inferability of system designs, and interaction-data feedback loops constitute distinctive features warranting dedicated theoretical treatment.
We derive five research propositions intended to guide future empirical investigation and provide candidate proxy metrics for the execution and data-flywheel layers. Scope is explicitly bounded to frontier, general-purpose, cloud-deployed AI systems. The central empirical illustration is the March 31, 2026 Claude Code source-map disclosure, treated as an illustrative case with explicit evidentiary caveats.
1. Introduction
The rapid progress of artificial intelligence over the past decade has been dominated by a single organizing principle: scale. From early deep learning systems to contemporary large language models, improvements in performance have been closely tied to increases in data, compute, and parameter count. This paradigm has shaped both research agendas and industry competition, leading to an implicit assumption that the primary locus of innovation — and therefore value — resides in the model itself.
This assumption is now under strain.
While frontier models continue to improve, the rate at which their capabilities converge across leading organizations has accelerated. At the same time, a different layer of the AI stack has grown in complexity and importance. Modern AI deployments are no longer characterized by direct interaction with a single model, but by the integration of multiple components: prompt construction pipelines, tool-use frameworks, retrieval systems, safety filters, and evaluation mechanisms. These components form what we term the control plane — the set of processes governing how model capabilities are invoked, constrained, and applied.
The competitive question has therefore shifted from ‘who builds the best model?’ to ‘who can most effectively orchestrate, deploy, and continuously improve what those models can do?’ — and ultimately to ‘who accumulates the organizational capacity to sustain that advantage over time?’ This paper provides a formal framework for understanding that shift.
1.1 Motivating Example
On March 31, 2026, an npm source-map containing approximately 500,000 lines of Claude Code’s agentic orchestration harness became externally accessible. The disclosed material revealed a layered design in which model outputs are shaped by extensive control-plane logic external to the model itself, including dynamic prompt assembly, tool routing, multi-stage safety filtering, query-engine orchestration, and feature flags for agentic capabilities. While this material could be studied and partially replicated by external developers, overall system performance appeared to remain dependent on factors — deployment scale, integration quality, iteration processes, and accumulated interaction data — not present in the exposed code. We treat this as an illustrative case with explicit evidentiary caveats, examined in depth in §5.4.
1.2 The Core Argument
We argue that value within the AI stack is undergoing a structural migration across three layers — models, control-plane systems, and execution — and that proprietary interaction data functions as a state variable that compounds the defensibility of the execution layer over time. The competitive implication: in mature AI markets, model capability becomes a necessary but non-differentiating input. Long-run winners will be those who build the best organizational systems for turning models into outcomes and accumulating the interaction data that makes those systems continuously improve.
1.3 Relationship to Prior Literature
The value-migration argument has important antecedents. Baldwin and Clark’s modularity thesis [15] demonstrated how progressive standardization of interfaces shifts competitive advantage upward through stack layers. Cusumano, Gawer, and Yoffie [16] extended this to platform economics, showing how integration-point controllers capture disproportionate value as components commoditize. Porter’s competitive advantage framework [34] provides foundational vocabulary for differentiation and cost position.
Most directly relevant is Teece, Pisano, and Shuen’s Dynamic Capabilities framework [35], which identifies sensing, seizing, and reconfiguring as the organizational routines through which firms sustain competitive advantage in fast-moving environments. The Execution layer in this paper maps precisely to Teece’s dynamic capabilities: deployment discipline, iteration velocity, and organizational learning are the sensing and seizing routines that reconfigure AI capability into realized value. This linkage grounds the Execution layer in an established strategic construct and explains the mechanism by which it resists imitation — dynamic capabilities are organizational routines that are tacit, path-dependent, and not easily codified, making them resistant to replication in ways that model weights and system designs are not.
The Control Plane Thesis extends these foundations in three ways specific to AI systems. First, the control plane modifies the probabilistic behavior of a foundation model rather than routing deterministic function calls — creating a categorically different engineering object from traditional middleware, one whose behavior is harder to specify, observe, and replicate fully. Second, AI orchestration logic can be partially inferred from model outputs, exposing it to competitive imitation not present in earlier technology stacks. Third, proprietary interaction data generated through deployment functions as an accumulating state variable that compounds the execution moat — a self-reinforcing mechanism not central to prior modularity or platform analyses. These features justify a dedicated framework.
1.4 Scope Conditions
The Control Plane Thesis is scoped to frontier, general-purpose, cloud-deployed AI systems — the large language model and multimodal systems characteristic of organizations such as Anthropic, OpenAI, and Google DeepMind. Several adjacent domains exhibit different dynamics:
- Specialized vertical AI (e.g., protein structure prediction, chip design): model architecture remains highly differentiated; commoditization is slower. Value migration may stall at Layer I for longer.
- Safety-critical and regulated systems (e.g., medical diagnosis, autonomous vehicles): regulatory validation requirements create artificial moats at the model layer; the execution layer’s importance is reinforced through compliance and audit readiness rather than iteration velocity.
- Edge and embedded AI (e.g., on-device inference, IoT): compute constraints limit control-plane complexity, reducing leverage. Model efficiency and compression dominate over system design.
These boundary conditions narrow but clarify the thesis. The framework is most directly applicable to the domain currently experiencing the most rapid commercial competition: cloud-deployed, general-purpose frontier AI.
1.5 Paper Organization
Section 2 defines the AI stack layers as ontological categories — what exists. Section 3 formalizes the Control Plane Thesis as theory — how value behaves across layers over time. Section 4 presents empirical patterns — what we observe. Section 5 examines individual case studies. Sections 6 and 7 explore economic and strategic implications. Section 8 addresses limitations and counterarguments. Section 9 concludes.
2. The AI Stack: Ontology
| Reviewer Response: V3.0 reviewers noted redundancy between §2, §3, and §4. V4.0 sharpens the role of each section: §2 defines what exists (ontology), §3 states how value behaves (theory), §4 presents what we observe (evidence). Cross-section repetition has been reduced. |
Modern AI systems are multi-layered constructs in which model inference is one component within a broader computational and organizational pipeline. We identify three primary layers and one cross-layer state variable.
2.1 Layer I: Models (The Commoditizing Substrate)
Foundation models — large language models and multimodal systems — serve as general-purpose function approximators trained on large-scale datasets, exposing interfaces via APIs for downstream use. Despite their centrality, three properties drive progressive commoditization: capability convergence across leading systems, rapid diffusion of architectural innovations and training methodologies, and the emergence of open-weight substitutes that approximate frontier performance in many application domains. Models remain necessary infrastructure but are shifting from primary differentiators to shared substrates.
2.2 Layer II: Control-Plane Systems (The Leverage Layer)
The control plane comprises the components governing how model capabilities are invoked, structured, and constrained: prompt orchestration pipelines, tool integration frameworks, retrieval-augmented generation (RAG) systems, safety and policy enforcement layers, and evaluation and feedback loops. These components transform a raw model into a task-oriented system. Two deployments using the same underlying model may exhibit substantially different behavior depending on control-plane configuration. The control plane provides transient leverage — meaningful differentiation that diffuses over time as innovations are observed and replicated.
2.3 Layer III: Execution (The Defensibility Layer)
Execution encompasses the organizational capabilities required to deploy, operate, and continuously improve AI systems in real-world environments: deployment infrastructure, iteration velocity, product integration, operational discipline, and organizational learning. This layer maps directly to Teece’s dynamic capabilities [35] — the sensing, seizing, and reconfiguring routines through which firms sustain advantage in fast-moving environments. These capabilities are tacit, path-dependent, and not easily codified, making them the primary source of durable competitive advantage. Critically, execution capacity is not static: it grows as a function of accumulated proprietary interaction data, as specified in §2.4.
2.4 Data as State Variable: D(t)
| Reviewer Response: V3.0 reviews identified unresolved ontological ambiguity: data was simultaneously called ‘fuel’ (consumable), ‘candidate fourth layer’ (structural), and ‘cross-cutting resource’ (neither). V4.0 resolves this by formalizing D as a state variable that accumulates over time and governs execution capacity, unifying the Data Flywheel with the execution moat. |
We formalize proprietary interaction data not as a separate layer but as a state variable D(t) that accumulates through deployment and governs execution capacity. The formal relationship is:
E = E(D(t))
where D(t) is the stock of proprietary interaction data at time t, and E is an increasing function of D(t). This formulation captures the data flywheel mechanism: deployment generates interaction data (logs, failure traces, preference signals) that improves system performance, which generates more deployment, compounding the execution moat over time.
We distinguish three categories of data with different competitive properties:
- Pretraining data: Large-scale datasets used in model training. Increasingly contestable via synthetic data generation and knowledge distillation. Contributes to Layer I but is not uniquely defensible.
- Task-specific and retrieval corpora: Domain knowledge bases used in control-plane RAG systems. Moderately defensible where proprietary; diffusible as public corpora improve.
- Proprietary interaction data (D(t)): Post-deployment logs, preference signals, and failure traces generated through actual use. This is D(t): non-fungible, context-specific, non-transferable, and self-reinforcing. It is the uniquely defensible data category and the mechanism by which the execution moat compounds over time.
This formulation resolves prior ambiguity: data is not a separate structural layer but a stock variable that powers the execution layer. The ‘fuel’ metaphor is partially accurate (data flows into execution) but the ‘state variable’ framing is more precise because D(t) accumulates rather than being consumed.
2.5 Interactions and Value Migration
The three layers operate as an integrated system in which each amplifies the value of the others — a complementarity captured in the multiplicative interaction term of the value function (§3.1). The locus of value creation is not static. As models mature and capabilities converge, differentiation shifts toward the control plane. As control-plane techniques diffuse, advantage migrates toward execution. And as execution compounds through accumulated D(t), the moat deepens:
From models (commoditization) → to systems (leverage) → to execution + data (defensibility).
3. The Control Plane Thesis: Theory
We formalize the central claim: that economic value within the AI stack is undergoing a structural migration across layers over time, with the execution layer — compounded by accumulated interaction data — emerging as the primary locus of durable competitive advantage. This can be interpreted as a reweighting of the coefficients in the system value function as the industry matures.
3.1 Formal Statement of the Thesis
| Reviewer Response: Critical revision per Reviewers 3 and 4: the value function now includes time-dependent coefficients with explicit trajectory statements, including δ(t) increasing. The non-negativity constraint is stated. The data state variable E = E(D(t)) is integrated into the formal model. |
Let an AI system S be represented as S = (M, C, E(D(t))), where M denotes the model layer, C the control-plane system, and E(D(t)) execution capacity as governed by accumulated proprietary interaction data. We propose the following time-dependent value function:
V(t) = α(t)M + β(t)C + γ(t)E(D(t)) + δ(t)(M · C · E(D(t)))
where α, β, γ, δ ≥ 0 are non-negative, time-varying coefficients. We assume non-negative coefficients throughout: commoditization reduces α(t) but does not make it negative, as models always provide necessary infrastructure. The coefficient constraints and trajectories are:
- α(t) decreasing: model-layer contribution diminishes as capabilities converge and substitutes emerge; approaches but does not reach zero.
- β(t) hump-shaped: control-plane contribution rises as system design differentiates, then plateaus and declines as orchestration techniques diffuse and standardize (the HTTP/SQL risk).
- γ(t) increasing: execution-layer contribution grows persistently, compounded by the accumulation of D(t).
- δ(t) increasing: the complementarity between all three layers grows over time, as integrated systems with strong execution and rich D(t) realize value synergies unavailable to firms operating at only one layer.
The additive terms capture independent contributions of each layer. The multiplicative interaction term δ(t)(M · C · E(D(t))) captures complementarity — each layer amplifies the value of the others — and implies that weakness in any layer caps total realized value. The Control Plane Thesis asserts that the trajectory of these coefficients follows the pattern above: a reweighting of the value function over time, away from M and toward E and the δ complementarity that mature integrated systems can realize.
Estimating α, β, γ, δ and their time derivatives — and tracking how they evolve across organizations and verticals — is the primary empirical research agenda implied by this framework.
3.2 Commoditization, Leverage, and Defensibility
- Models (Commoditization Layer): High fixed costs, declining differentiation. Economically resemble commodities with high fixed costs but diminishing marginal pricing power.
- Control Plane (Leverage Layer): Amplifies model utility; provides transient differentiation with high return on iteration and moderate barriers to entry.
- Execution / E(D(t)) (Defensibility Layer): Organizational dynamic capabilities compounded by accumulating proprietary interaction data. Tacit, path-dependent, and not easily codified.
Models are commodities. Systems are leverage. Execution is the moat. Data is the flywheel.
3.3 Mechanisms of Value Migration
(1) Capability Convergence at the Model Layer
As leading models approach similar performance levels, the incremental benefit of marginal model improvements diminishes. Benchmark performance on agentic tasks (SWE-Bench, GPQA Diamond, ARC-AGI 2) in early 2026 shows substantially overlapping performance bands among Claude Opus 4.6, GPT-5.4, and Gemini 3.1 — differences that are incremental, context-dependent, and difficult for end users to consistently perceive.
(2) Rapid Diffusion at the Control-Plane Layer
Control-plane innovations diffuse rapidly. Chain-of-thought prompting [22] and the ReAct framework [23] — both introduced as research contributions — were widely replicated across AI products within months of publication, illustrating how system-level techniques propagate faster than model capabilities. Differentiation at this layer emerges quickly but erodes over time.
(3) Stickiness at the Execution Layer, Compounded by D(t)
Execution capabilities exhibit Teece’s characteristics of path dependence, tacit knowledge, and integration complexity. Crucially, they compound over time through accumulated D(t): each deployment cycle generates interaction data that improves subsequent system performance, deepening the moat without requiring model retraining or orchestration redesign.
3.4 The Replication Gradient
| Layer | Replication Potential | Primary Imitation Mechanism | Illustrative Timeline* |
| Models | Increasing | Sufficient compute, open weights, architectural diffusion | Months to years (shrinking) |
| Control Plane | Moderate | Output inference, documentation, disclosures, prompt extraction | Days to months |
| Execution / E(D(t)) | Low | Requires organizational rebuilding + D(t) stock re-accumulation | Years to indefinite |
* Timelines are illustrative estimates based on author judgment and case evidence; systematic measurement via P4 is required to validate. A footnote-level disclaimer is appropriate in final journal submission.
3.5 The Leak Paradox
| Reviewer Response: Revised per Reviewer 4: the three plausible causal mechanisms are now stated explicitly, with the thesis’s preferred mechanism identified. |
The Control Plane Thesis predicts the Leak Paradox: the exposure of internal system components accelerates imitation at the model and control-plane layers while simultaneously reinforcing the relative importance of execution — the layer that cannot be leaked. Three causal mechanisms could produce this outcome:
- Revaluation: Leaks reveal that C is less differentiated than assumed, correcting prior overvaluation of system-level assets and shifting investor and strategist attention upward.
- Diffusion acceleration: Leaks accelerate C diffusion, making control-plane capabilities relatively more abundant and therefore less scarce — increasing the relative scarcity and value of E.
- Execution revelation (thesis-preferred mechanism): Leaks demonstrate empirically that replication of C does not produce parity in overall system performance, directly revealing that E and D(t) are the true differentiators. Competitors discover that what they most needed — accumulated interaction data and organizational iteration capability — was never in the leaked code.
The thesis primarily predicts the third mechanism, though all three may operate simultaneously. A genuine empirical test would require a controlled disclosure study or systematic measurement of performance parity achieved before and after a major control-plane exposure.
3.6 Research Propositions
We derive five research propositions to guide future empirical investigation. None are formally tested within the scope of this paper; the evidence in Sections 4 and 5 establishes plausibility.
- P1 (Model Convergence): Performance variance between leading models decreases over time relative to variance in system-level implementations.
- P2 (System Sensitivity): For a fixed model M, variation in control-plane configuration C produces significant differences in task performance.
- P3 (Execution Dominance): Organizational execution capabilities E are stronger predictors of long-term user adoption and retention than marginal improvements in M. Candidate proxies defined in §8.6.
- P4 (Replication Asymmetry): Time to functional equivalence at the control-plane layer is significantly shorter than at the execution layer. Functional equivalence defined as achieving within-5% task performance on a specified benchmark suite.
- P5 (Data Flywheel): dE/dD(t) > 0: organizations with superior proprietary interaction data infrastructure achieve compounding improvements in execution-layer performance over time, independent of model or control-plane changes.
3.7 Implications for the Structure of Competition
If the Control Plane Thesis holds, AI competition shifts along three dimensions:
- From Capability to Control: The key question shifts from ‘Which model is best?’ to ‘How effectively is capability directed, constrained, and continuously improved?’
- From Innovation to Integration: The focus moves from isolated breakthroughs to the integration of multiple components into coherent, data-accumulating systems.
- From Technology to Organization: Competitive advantage increasingly reflects organizational dynamic capabilities — not purely technical achievement.
At its core, the Control Plane Thesis reframes AI competition from a production function (models) to an organizational function (execution). This connects the AI strategy literature directly to Porter, Simon, and Teece — and extends their frameworks to a domain where the ‘component’ being commoditized is itself a general-purpose cognitive engine.
4. Evidence of Value Migration
This section presents empirical patterns supporting the Control Plane Thesis. The evidence is primarily qualitative and comparative; we acknowledge this as a limitation. §4.1 now includes a quantitative benchmark example to anchor the model convergence claim.
4.1 Model Convergence and the Erosion of Differentiation
| Reviewer Response: New quantitative anchor added per Reviewers 3 and 4. The convergence claim is now supported by a specific benchmark citation. |
A central prediction of the thesis is that the marginal contribution of model improvements is declining. Evidence can be observed in benchmark convergence among leading frontier models. In early 2026 agentic benchmark evaluations, SWE-Bench verified pass rates for Claude Opus 4.6, GPT-5.4, and Gemini 3.1 fell within a 3–5 percentage point band on representative tasks — a range that is statistically meaningful but small relative to the differences between frontier and open-weight systems a year prior. Similarly, on GPQA Diamond (graduate-level scientific reasoning), scores across the three systems clustered between 78% and 83% as of Q1 2026, compared to a 25-point spread between leading closed and open models in early 2024.
These numbers are approximate and drawn from contemporaneous reporting rather than controlled benchmarking; they illustrate the convergence trend rather than establishing it definitively. The rise of open-weight models has further narrowed the accessibility gap. Together, these dynamics suggest model capability alone is no longer sufficient to sustain differentiation in downstream applications where performance is mediated by system design — consistent with α(t) decreasing.
4.2 System-Level Differentiation
While model capabilities converge, variation at the control-plane layer remains salient. Systems built on similar or identical models exhibit markedly different behaviors depending on orchestration across prompt engineering and dynamic context construction, tool use and external API integration, retrieval augmentation and knowledge grounding, multi-step reasoning and task decomposition, and safety filtering and policy enforcement. In production environments, these components produce substantial differences in task success rates, reliability, and user satisfaction even when the underlying model is held constant — consistent with β(t) providing meaningful leverage during this phase.
4.3 Rapid Diffusion and Partial Replicability of Systems
Control-plane innovations diffuse rapidly. Chain-of-thought prompting [22] and the ReAct framework [23] — both introduced as research contributions — were replicated and deployed across major AI products within months of publication. Control-plane components are frequently modular and composable, making them easier to replicate in isolation. Advantages at the system layer are time-bound: early movers benefit from temporary differentiation, but successful patterns are quickly adopted across the ecosystem. This aligns with P4 and reinforces the classification of the control plane as a leverage layer rather than a permanent moat.
4.4 Execution and the Data Flywheel as Persistent Advantage
In contrast to rapid diffusion at the control-plane layer, differences in execution capabilities exhibit strong persistence. Organizations with similar access to models and system designs produce divergent outcomes in user adoption, retention, and monetization. These differences are attributable to organizational processes and accumulated D(t) rather than technical assets. Each deployment cycle generates proprietary interaction data — failure traces, preference signals, usage patterns — that improves subsequent system versions, compounding the execution moat. This dynamic is consistent with γ(t) increasing and dE/dD(t) > 0 (P5).
4.5 Evidence Summary
| Layer | Empirical Pattern | Competitive Role | Value Function Coefficient | Key Proposition |
| Models | Convergence; benchmark bands narrowing to 3–5pp | Commoditization | α(t) decreasing | P1 |
| Control Plane | Differentiation; diffusion in weeks–months | Leverage | β(t) hump-shaped | P2, P4 |
| Execution | Persistent divergence in organizational outcomes | Defensibility | γ(t) increasing | P3 |
| D(t) / Data Flywheel | Interaction data compounds execution moat | Compounding moat | Governs E = E(D(t)) | P5 |
5. Case Studies
This section examines four cases illustrating how similar underlying capabilities yield divergent outcomes depending on higher-layer dynamics.
5.1 Open-Weight Models vs. Proprietary Systems
Organizations such as Meta have released increasingly capable open-weight models. Across many applications, these models can approximate the performance of proprietary systems when combined with retrieval augmentation, prompt engineering, and tool integration. However, out-of-the-box performance is often less consistent; significant engineering effort is required to reach production quality; and user experience varies widely. This case illustrates the separation between capability availability (models are accessible) and capability realization (performance depends on systems and execution). Model access alone does not determine competitive outcomes.
5.2 Agent Systems: Same Model, Different Outcomes
Agent systems — built on top of shared model APIs — demonstrate system sensitivity (P2) most directly. The ReAct framework [23] showed that switching from standard prompting to reasoning-action loops on the same base model improved HotpotQA accuracy from approximately 28% to 36%, with similar gains across multiple benchmarks. This provides a controlled within-model comparison demonstrating that orchestration design, not model capability, drove the variance. Failure modes in agent systems commonly arise from coordination issues — inefficient task decomposition, poor tool-selection logic, inadequate feedback loops — rather than model limitations. Note that systematic controlled studies comparing performance variance across diverse agent scaffolds remain sparse; P2 is intended to motivate such studies.
5.3 Organizational Execution: Divergence Under Similar Conditions
Leading AI organizations — including OpenAI, Anthropic, and Google DeepMind — operate with access to comparable levels of talent, compute, and model architectures. Despite these similarities, observable differences persist in product release cadence, feature integration and coherence, deployment reliability, user feedback incorporation speed, and ecosystem development. These translate into variation in user adoption, enterprise integration, and perceived product quality. Such divergence reflects organizational processes, engineering culture, and accumulated D(t) — precisely the dynamic capabilities that the Execution layer captures — rather than model-level differences.
5.4 The March 31, 2026 Claude Code Disclosure
On March 31, 2026, an npm source-map containing approximately 500,000 lines of Claude Code’s agentic orchestration harness became externally accessible. Secondary journalistic reports (references [26]–[30], individually listed) described this as revealing substantial control-plane architecture: dynamic prompt assembly, tool-routing logic, multi-stage safety filtering, query-engine orchestration, and feature flags for future agentic capabilities.
We treat this as an illustrative case rather than a rigorous empirical test. The evidentiary limitations are significant: the evidence base consists of secondary journalistic accounts and community discussions whose primary documentation is not independently verifiable; the scope of what was exposed cannot be confirmed; and the extent of developer replication was not measured systematically. We use ‘approximately 500,000 lines’ rather than the specific figure cited in some reports, given that the precision of secondary sources cannot be confirmed.
With those caveats stated, the reported pattern is consistent with the Execution Revelation mechanism of the Leak Paradox: control-plane elements could be studied and partially replicated in isolation, but overall system performance appeared to remain dependent on deployment scale, integration quality, and accumulated interaction data D(t) — none of which were present in the exposed code. Independent commentators on technical platforms described this as discovering that ‘the agentic harness is the real product’ and ‘orchestration layer > model,’ echoing the thesis’s language. A genuine test of the Leak Paradox would require a controlled disclosure study or a systematic retrospective measurement of performance parity achieved after exposure.
5.5 Cross-Case Synthesis
| Case | Layer Highlighted | Thesis Element Illustrated |
| Open-weight models (§5.1) | Model (Layer I) | Capability available ≠ capability realized |
| Agent systems (§5.2) | Control Plane (Layer II) | System sensitivity (P2); orchestration drives variance |
| Organizational divergence (§5.3) | Execution / E(D(t)) (Layer III) | Dynamic capabilities; D(t) compounding |
| Claude Code disclosure (§5.4) | Control Plane + Execution | Leak Paradox (Execution Revelation mechanism) |
6. Economic Implications
As value migrates from models to systems to execution, traditional assumptions about cost structure, differentiation, and competitive advantage must be revised.
6.1 Value Capture Across the Stack
Models: High Cost, Declining Differentiation
Foundation models remain capital-intensive to develop. However, marginal performance improvements yield diminishing returns and pricing power is constrained by substitution — including open-weight models. Model providers bear high fixed costs while facing increasing pressure toward commoditization. The strategic implication: model development creates optionality, but does not by itself determine where value is captured.
Control Plane: Value Amplification, Limited Retention
Control-plane systems amplify utility extracted from models, providing high return on iteration and moderate barriers to entry. However, limited durability of advantage — due to rapid diffusion — means the control plane supports temporary value capture for early movers rather than long-term dominance.
Execution / E(D(t)): Persistent and Compounding Value Capture
Execution capabilities enable sustained value capture through compounding returns (D(t) accumulation improves E over time), switching costs (users become embedded in integrated workflows), and network effects (interaction data and usage reinforce system quality). Unlike models and system designs, execution advantages are non-tradable and non-transferable — embedded within organizations and their accumulated D(t) stock.
6.2 Cost Structure: From CapEx to OpEx
Model development is dominated by capital expenditure — large upfront investments in compute and training. Control-plane systems and execution rely more on operational expenditure — continuous engineering, deployment, and iteration. As value shifts upward, firms must reallocate resources toward ongoing system refinement and interaction-data infrastructure rather than episodic model training cycles.
6.3 Market Structure
Different layers exhibit distinct competitive characteristics: the model layer is oligopolistic due to high capital requirements; the control-plane layer is competitive and dynamic with rapid entry and exit; and the execution layer is consolidating, favoring firms with strong operational dynamic capabilities and deep D(t) infrastructure. A key implication is the decoupling of technical capability from economic capture: organizations with the most advanced models do not necessarily capture the most value.
6.4 Capital Allocation
From an investment perspective: model-centric investments remain necessary but offer uncertain returns; system-level investments provide high short-term leverage; and execution-focused investments — especially interaction-data infrastructure — offer the strongest long-term defensibility. Model development creates optionality; system design exercises those options; execution and D(t) accumulation determine realized long-run value.
7. Strategic Implications
7.1 Strategy for Frontier Model Developers
Organizations at the model layer face commoditization pressure. To avoid it, they must invest in control-plane capabilities, build end-to-end systems, control user-facing interfaces, and develop interaction-data infrastructure. They must compete on execution — rapid iteration cycles, reliable deployment, and continuous improvement fueled by D(t). In effect, model developers must evolve into full-stack AI organizations where the model is the substrate, not the product.
7.2 Strategy for Startups and New Entrants
Startups can compete by building specialized orchestration layers, leveraging existing models, and focusing on high-impact use cases where system design matters. Given the replicability of control-plane innovations, startups must iterate faster than incumbents, deliver superior user experience, and target specific domains where D(t) accumulation can create durable competitive positions. Success depends on operational excellence and data-flywheel construction, not proprietary model development.
7.3 Strategy for Enterprises and Adopters
Enterprises should not let model selection dominate strategy. Instead, focus on system integration, workflow alignment, and interaction-data infrastructure. Building D(t) — capturing failure traces, preference signals, and usage patterns — is particularly important for sustaining advantage. Treat AI not as a plug-and-play solution but as an organizational dynamic capability that deepens over time.
7.4 Strategic Synthesis
Across all actors: capability is necessary but insufficient; system design determines short-term performance; execution and D(t) accumulation determine long-run success. In mature AI markets, model capability becomes a necessary but non-differentiating input. The long-run winners will be those who build the best organizational systems for turning models into outcomes — and accumulate the interaction data that makes those systems continuously improve.
8. Risks and Counterarguments
8.1 Model Re-Differentiation: Commoditization vs. Concentration
| Reviewer Response: New distinction added per Reviewer 4: separates ‘commoditization’ (substitutability, price competition) from ‘concentration’ (few suppliers). Both dynamics are present but have different strategic implications. |
The thesis asserts model commoditization (P1). Two distinct counterarguments must be separated:
Step-change re-differentiation: Qualitative breakthroughs in autonomous reasoning or multimodal integration could re-establish model-level moats. The thesis does not deny this possibility — it asserts that even in the presence of model-level differentiation, systems and execution mediate capability realization, and that historical patterns suggest even decisive capability advantages are temporary. Value migration would resume once the breakthrough diffuses.
Concentration without commoditization: High compute costs may result in an oligopolistic model layer — only 3–4 firms capable of training frontier models — even as model outputs become substitutable across providers. This is a crucial distinction: concentration implies few suppliers and high margins at the model layer; commoditization implies substitutability and price competition. Open-weight models drive commoditization; compute barriers drive concentration. The thesis holds primarily under commoditization dynamics. If concentration dominates in practice — with a small cartel of model providers capturing high margins despite output similarity — the value-migration prediction may be attenuated at the model layer while still holding for the control plane and execution layers.
8.2 Control-Plane Commoditization
If orchestration patterns, agent frameworks, and system architectures become standardized — through open-source libraries, shared tooling, or industry best practices — the control plane may commoditize more rapidly than anticipated. The HTTP/SQL analogy is apt: if agentic frameworks become infrastructure, the leverage layer dissolves and value migrates directly to execution. This is consistent with the thesis: accelerated C commoditization strengthens the long-run argument while compressing the timeline. This represents a genuine planning risk for organizations that expect control-plane moats to last longer than the evidence warrants.
8.3 Data (D(t)) as State Variable
As formalized in §2.4, proprietary interaction data is a state variable D(t) that governs execution capacity E = E(D(t)), rather than a separate structural layer or a loosely defined ‘fuel.’ This formulation resolves the ontological ambiguity from V2.0–V3.0. The data flywheel is the mechanism by which D(t) accumulates: deployment generates interaction data, which improves E(D(t)), which generates more deployment. The uniquely defensible category is post-deployment interaction data (logs, preference signals, failure traces) — non-fungible, context-specific, and non-transferable. The D × E interaction (formally: the functional relationship E = E(D(t)) with dE/dD > 0) may represent the most defensible competitive position in the AI stack.
8.4 M × C Interactions: Substitution vs. Amplification — Temporal Dynamics
| Reviewer Response: Expanded per Reviewer 4: explicitly addresses whether M×C substitution accelerates or delays value migration to E. |
The control-plane ‘leverage’ metaphor implies C amplifies M. However, RAG compensates for weaker parametric knowledge, prompt-based tool use compensates for reasoning gaps, and constrained output schemas compensate for model inconsistency. Along these dimensions, C substitutes for M rather than merely amplifying it.
The temporal dynamics of substitution on value migration require explicit treatment: if C can substitute for M, firms may delay investment in frontier models, using strong orchestration to extend the competitive life of less capable models. This stalls the migration to execution by maintaining C-layer relevance longer — β(t) remains elevated rather than transitioning to the hump. Simultaneously, the value of frontier M is partially compressed, since orchestration can close much of the capability gap. The net effect on migration speed depends on whether substitution extends C-layer relevance faster than it compresses M-layer differentiation.
The thesis’s structural prediction — that value eventually migrates to E(D(t)) — is not undermined by substitution, but the timeline may be longer than predicted in environments where strong C can indefinitely compensate for weak M. Organizations should therefore track the rate at which open orchestration frameworks absorb the substitution function, as this signals when the C-layer will itself begin commoditizing.
8.5 Regulation: Layer-Differentiated Effects
| Reviewer Response: Expanded per Reviewer 4: distinguishes regulatory effects across the three layers rather than treating regulation as a uniform exogenous constraint. |
Regulatory developments affect different layers differently:
- Safety and content rules → Control Plane (C): If the EU AI Act mandates certified safety filters and audit trails, regulation effectively mandates C-layer standardization, potentially accelerating control-plane commoditization — since all firms must implement similar components. This could compress the timeline for migration to E.
- Compute and export restrictions → Model Layer (M): GPU export controls and compute access restrictions limit which firms can train frontier models, reinforcing concentration at Layer I. This may preserve model-layer differentiation in some markets even as commoditization dynamics operate in others.
- Data privacy regulations → D(t): Restrictions on data collection, retention, and cross-border transfer directly limit D(t) accumulation, constraining the data flywheel. This may reduce the defensibility of the E(D(t)) moat in regulated markets and create asymmetries between firms operating in different regulatory jurisdictions.
Regulation does not invalidate the layered structure, but it reshapes the speed and geography of value migration. Organizations operating across regulatory jurisdictions face an additional layer of complexity: their E(D(t)) moat may deepen in some markets while being artificially constrained in others.
8.6 Measurement Challenges and Candidate Proxy Metrics
Execution capabilities and D(t) accumulation are diffuse and not directly observable. This measurement challenge is currently unsolved, and developing validated constructs is a prerequisite for formally testing P3, P4, and P5. We provide candidate proxy metrics with suggested materiality thresholds:
- Deployment frequency (P3): Number of production deployments of control-plane or system-level changes per month affecting more than 1% of user sessions (materiality threshold excludes trivial hotfixes). Higher frequency signals stronger execution dynamic capabilities.
- Model-failure-to-fix latency (P3): Elapsed time from identification of a model failure or capability gap to deployment of a system-level fix. Analogous to DevOps mean time to recovery (MTTR). Operationalizes iteration velocity.
- User retention delta (P3): Change in 30-day user retention rate following a feature update, relative to pre-update baseline. Measures whether iteration translates into realized user value rather than just system changes.
- Time-to-functional-equivalence (P4): Elapsed time from public awareness of a control-plane feature or disclosure event to a third party achieving within-5% task performance on a specified benchmark suite. Operationalizes replication asymmetry for control-plane vs. execution features.
- Failure trace coverage and diversity (P5): Volume and semantic diversity of logged failure traces per active user per month. Measures D(t) accumulation quality; normalized by active user base to control for scale differences.
- Preference signal coverage (P5): Fraction of user interactions generating a usable preference signal (e.g., explicit feedback, implicit engagement, correction events) per active user per week. Higher coverage indicates richer D(t) accumulation.
These metrics require validation studies before use in formal hypothesis testing. Attribution is also challenging: model improvements and system changes are often deployed simultaneously, making it difficult to isolate the contribution of each layer to value function changes.
8.7 Temporal and Contextual Limitations
The thesis may be time-bound or context-dependent. It is best understood as a dynamic framework applicable to the current trajectory of frontier AI — within the scope conditions of §1.4 — with validity that should be re-evaluated as the field evolves. The value migration dynamic is a trajectory prediction, not a timeless structural claim.
8.8 Summary of Counterarguments
| Counterargument | Thesis Response | Net Effect on Thesis |
| Step-change model re-differentiation | Migration resumes after diffusion; systems and execution still mediate realization | Temporary reversal, not invalidation |
| Model concentration without commoditization | Attenuates α(t) decline in oligopolistic scenarios; C and E dynamics still hold | Modifies Layer I prediction; thesis intact for upper layers |
| Control-plane commoditization (HTTP/SQL) | Accelerates migration to E(D(t)); consistent with thesis | Strengthens long-run argument; compresses β(t) hump |
| M×C substitution extending C relevance | Delays but does not prevent migration to E(D(t)) | Lengthens timeline; directional thesis intact |
| Regulation: safety rules standardize C | Accelerates C commoditization; strengthens E moat | Reinforces thesis direction; varies by jurisdiction |
| Regulation: data privacy limits D(t) | Reduces data flywheel strength in regulated markets | Constrains execution moat; creates cross-market asymmetries |
| Measurement challenges | Proxy metrics proposed; attribution problem acknowledged | Limits empirical validation; does not invalidate theory |
| Scope limitations (vertical/edge/regulated AI) | Explicitly bounded to frontier, cloud-deployed, general-purpose AI | Narrows but clarifies the thesis |
9. Conclusion
This paper has argued that the prevailing model-centric view of artificial intelligence is insufficient to explain how value is created and captured in modern AI systems. We have proposed the Control Plane Thesis, which identifies a structural migration of value within the AI stack — from models to systems to execution — and formalized this migration through a time-dependent multiplicative value function V(t) = α(t)M + β(t)C + γ(t)E(D(t)) + δ(t)(M·C·E(D(t))). The framework is grounded in Teece’s dynamic capabilities, Baldwin and Clark’s modularity thesis, and platform economics, and extends these traditions to the specific structural characteristics of AI systems.
The central aphorism, updated across revisions to reflect the framework’s evolution:
Models are commodities. Systems are leverage. Execution is the moat. Data is the flywheel.
This formulation resolves the data-layer ambiguity from prior drafts: proprietary interaction data D(t) is the flywheel — a state variable that accumulates through deployment, governs execution capacity E = E(D(t)), and compounds the defensibility of the execution moat over time. It is not a structural layer but a dynamic mechanism embedded within execution.
Several limitations constrain the present analysis: the evidence base is primarily qualitative; the research propositions have not been formally tested; the proxy metrics require validation; the M×C temporal substitution dynamics are under-specified; and the case study evidence relies on secondary journalistic accounts. These limitations define the research agenda implied by the framework.
At its core, the Control Plane Thesis reflects a familiar pattern in technological evolution: as foundational technologies mature, differentiation migrates upward in the stack — from components to systems, and from systems to their operation in real-world contexts. Artificial intelligence appears to be entering this phase now. The firms that recognize this shift earliest — and build the organizational dynamic capabilities and data-flywheel infrastructure to compound their execution advantages — will define the competitive landscape of the next decade.
9.1 Directions for Future Research
- Developing validated measurement constructs for execution capabilities — particularly the proxy metrics proposed in §8.6 — and for D(t) accumulation quality. This is a prerequisite for formally testing P3 through P5.
- Designing controlled studies to test P2 (System Sensitivity) by varying control-plane configurations across a fixed model, and P4 (Replication Asymmetry) by measuring time-to-functional-equivalence. A controlled disclosure study or systematic retrospective of the March 2026 Claude Code case would be particularly informative.
- Formalizing the M×C substitution function — including whether substitution effects on value migration timelines are accelerating or decelerating — and extending the multiplicative model with an explicit substitutability parameter.
- Testing the thesis across the scope conditions of §1.4: specialized vertical AI, safety-critical regulated systems, and edge/embedded AI, to determine where the value migration prediction applies with different speed and intensity.
References
[1] T. Brown et al., “Language Models are Few-Shot Learners,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020.
[2] OpenAI, “GPT-4 Technical Report,” arXiv:2303.08774, 2023.
[3] Anthropic, “Constitutional AI: Harmlessness from AI Feedback,” arXiv:2212.08073, 2022.
[4] R. Bommasani et al., “On the Opportunities and Risks of Foundation Models,” arXiv:2108.07258, 2021.
[5] J. Kaplan et al., “Scaling Laws for Neural Language Models,” arXiv:2001.08361, 2020.
[6] S. Hoffmann et al., “Training Compute-Optimal Large Language Models,” arXiv:2203.15556, 2022.
[7] J. Rae et al., “Scaling Language Models: Methods, Analysis & Insights from Training Gopher,” arXiv:2112.11446, DeepMind, 2021.
[8] Meta AI, “LLaMA: Open and Efficient Foundation Language Models,” arXiv:2302.13971, 2023.
[9] Meta AI, “LLaMA 2: Open Foundation and Fine-Tuned Chat Models,” arXiv:2307.09288, 2023.
[10] Anthropic, “Claude System Card,” Technical Report, 2024.
[11] OpenAI, “Function Calling and Tool Use in Large Language Models,” Technical Documentation, 2024.
[12] Google DeepMind, “Gemini: A Family of Highly Capable Multimodal Models,” arXiv:2312.11805, 2024.
[13] Y. Bengio, Y. LeCun, and G. Hinton, “Deep Learning,” Nature, vol. 521, pp. 436–444, 2015.
[14] H. Simon, “The Architecture of Complexity,” Proceedings of the American Philosophical Society, vol. 106, no. 6, pp. 467–482, 1962.
[15] C. Baldwin and K. Clark, Design Rules: The Power of Modularity, MIT Press, 2000.
[16] M. Cusumano, A. Gawer, and D. Yoffie, The Business of Platforms, Harper Business, 2019.
[17] A. Dixit and R. Pindyck, Investment Under Uncertainty, Princeton University Press, 1994.
[18] European Commission, “Artificial Intelligence Act,” Official Journal of the European Union, 2024.
[19] Stanford Institute for Human-Centered AI (HAI), “AI Index Report,” 2025.
[20] McKinsey Global Institute, “The Economic Potential of Generative AI,” 2023.
[21] N. Carlini et al., “Extracting Training Data from Large Language Models,” USENIX Security Symposium, 2021.
[22] J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 24824–24837, 2022.
[23] S. Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models,” International Conference on Learning Representations (ICLR), 2023.
[24] H. Touvron et al., “Llama 3 Technical Report,” Meta AI, arXiv:2407.21783, 2024.
[25] A. Agrawal, J. Gans, and A. Goldfarb, Prediction Machines: The Simple Economics of Artificial Intelligence, Harvard Business Review Press, 2018.
[26] The Verge, “Anthropic’s Claude Code Source-Map Leak Reveals Agentic Harness Architecture,” March 31, 2026.
[27] Business Insider, “Developers Analyze Claude Code Orchestration Layer After npm Source-Map Exposure,” April 2026.
[28] TechRadar, “Anthropic Confirms npm Source-Map Leak of Claude Code,” April 2026.
[29] The Guardian, “Claude Code Leak Raises Questions About Agentic AI Security and Competitive Dynamics,” April 2026.
[30] GitHub Community Discussions, “Analysis and Partial Reconstruction of Claude Code Agentic Harness from Leaked Source Maps,” April 2026.
Note: References [26]–[30] are secondary journalistic accounts and community discussions. Primary documentation has not been independently verified. These sources are cited for illustrative purposes only; claims derived from them are explicitly caveated in the text.
[31] OpenAI, “ChatGPT Plugins and Tool Ecosystem,” Technical Documentation, 2024.
[32] Google, “Vertex AI and Generative AI Platform Documentation,” 2024.
[33] Amazon Web Services, “Bedrock: Foundation Model Platform,” Technical Documentation, 2024.
[34] M. Porter, Competitive Advantage: Creating and Sustaining Superior Performance, Free Press, 1985.
[35] D. Teece, G. Pisano, and A. Shuen, “Dynamic Capabilities and Strategic Management,” Strategic Management Journal, vol. 18, no. 7, pp. 509–533, 1997.
[36] E. Brynjolfsson and A. McAfee, The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies, W.W. Norton & Company, 2014.

Leave a comment