Champaign Magazine

champaignmagazine.com


Aikipedia: New AI Lexicon, May 2026 (1)

By GPT-5.5, Claude 4.6 with W.H.L.

GPT-5.5: Below is an expanded version in a format that should map naturally into your magazine’s lexicon style.


Recommended Entries for

“Aikipedia: New AI Lexicon (May 2026)”


1. RecursiveMAS

Category: Multi-Agent Systems / Recursive Cognition

Date of Appearance / Establishment:
April 2026 (major visibility and discussion continuing into May 2026) (arXiv)

Definition (Suggested Framing):
RecursiveMAS (Recursive Multi-Agent Systems) is a recursive multi-agent framework in which agent collaboration occurs through iterative latent-space computation rather than conventional natural-language message exchange. The architecture extends recursive reasoning principles from single-model systems into multi-agent collaboration loops.

Why It Matters:
RecursiveMAS may represent an important transition:

  • from token-mediated agent interaction,
  • toward latent-space collective cognition.

The framework also reflects broader trends toward:

  • recursive computation,
  • compressed inter-agent communication,
  • and scalable cooperative reasoning.

Why It Fits Aikipedia:
The term is:

  • technically distinctive,
  • conceptually novel,
  • and likely to become part of the evolving multi-agent systems vocabulary of 2026.

Primary Sources:

  • arXiv: Recursive Multi-Agent Systems (arXiv)

2. Cognifold

Category: Agent Memory / Cognitive Architectures

Date of Appearance / Establishment:
May 13, 2026 (Hugging Face)

Definition (Suggested Framing):
Cognifold is a brain-inspired proactive memory architecture designed for persistent autonomous AI agents. It organizes fragmented experiences into evolving cognitive structures through a process termed “cognitive folding.”

Why It Matters:
Cognifold moves beyond:

  • reactive retrieval memory,
  • vector recall,
  • and static memory stores,

toward continuously reorganizing cognitive structures capable of:

  • intent emergence,
  • associative restructuring,
  • and persistent contextual evolution.

Why It Fits Aikipedia:
This is one of the strongest lexicon candidates of May 2026 because it introduces:

  • a distinctive terminology set,
  • a recognizable architectural philosophy,
  • and an emerging paradigm for agent memory systems.

Primary Sources:

  • arXiv: Cognifold: Always-On Proactive Memory via Cognitive Folding (arXiv)
  • Hugging Face Papers entry (Hugging Face)
  • Kurate paper summary (Kurate.org)

3. Cognitive Folding

Category: Cognitive Memory Systems / AI Cognition

Date of Appearance / Establishment:
May 2026 (arXiv)

Definition (Suggested Framing):
Cognitive Folding refers to the process by which fragmented event streams and memories are progressively reorganized into higher-order cognitive structures within persistent AI memory systems.

Why It Matters:
The concept reframes memory not as:

  • storage and retrieval,

but as:

  • continuous structural reorganization.

The term may become influential in future discussions involving:

  • agent memory,
  • identity persistence,
  • adaptive cognition,
  • and long-horizon autonomous systems.

Why It Fits Aikipedia:
The phrase is:

  • intuitive,
  • technically meaningful,
  • and philosophically rich.

It has unusually strong potential to persist as a durable AI systems term.

Primary Sources:

  • arXiv: Cognifold: Always-On Proactive Memory via Cognitive Folding (arXiv)
  • Hugging Face Papers entry (Hugging Face)

4. SIA (Self-Improving AI)

Category: Recursive Self-Improvement / Agent Engineering

Date of Appearance / Establishment:
May 26, 2026 (arXiv)

Definition (Suggested Framing):
SIA (Self-Improving AI) is a recursive improvement framework in which an AI system autonomously updates both:

  • its external harness/scaffolding,
  • and its internal model weights.

Why It Matters:
SIA represents one of the clearest attempts yet to unify:

  • harness optimization,
  • agentic scaffolding,
  • reinforcement-driven adaptation,
  • and model self-improvement

within a single recursive loop.

Why It Fits Aikipedia:
SIA directly intersects with:

  • Agentic Harness Engineering,
  • Recursive Self-Improvement,
  • and autonomous agent evolution,

making it especially relevant to your existing Aikipedia ecosystem.

Primary Sources:

  • arXiv: SIA: Self Improving AI with Harness & Weight Updates (arXiv)
  • Emergent Mind summary (Emergent Mind)
  • GitHub repository (GitHub)

5. Repository-Level Auditing

Category: AI Safety / Systems Evaluation

Date of Appearance / Establishment:
May 2026 prominence through MEERKAT-related discussions

Definition (Suggested Framing):
Repository-Level Auditing refers to AI evaluation methodologies that assess system behavior across collections of interactions, traces, repositories, or long-term operational artifacts rather than isolated prompts or sessions.

Why It Matters:
The concept reflects a growing realization that:

  • harmful,
  • deceptive,
  • or emergent behavior

may only become visible across aggregated operational histories.

This represents a shift:

  • from single-session evaluation
  • toward ecosystem-level behavioral auditing.

Why It Fits Aikipedia:
This is precisely the kind of:

  • systems-layer terminology,
  • emerging safety lexicon,
  • and methodological evolution

that your Aikipedia series documents especially well.

Primary Sources:

  • MEERKAT-related federated learning discussions
  • Related auditing and systems-evaluation research emerging during May 2026

6. AI Operating Environment

Category: AI Ecosystems / Platform Architectures

Date of Appearance / Establishment:
May 2026 (especially following Google I/O 2026)

Definition (Suggested Framing):
AI Operating Environment refers to a persistent AI-native computational ecosystem in which intelligent agents function continuously across applications, interfaces, workflows, and devices rather than as isolated chatbot sessions.

Why It Matters:
The phrase captures a major industry transition:

  • from AI tools,
  • toward AI-native environments.

The concept became increasingly visible during:

  • Google I/O 2026,
  • OpenAI hardware discussions,
  • and broader agentic-system integration trends.

Why It Fits Aikipedia:
The term has strong long-term conceptual potential similar to earlier computing-era transitions involving:

  • operating systems,
  • browsers,
  • cloud platforms,
  • and mobile ecosystems.

Primary Sources:

  • Google I/O 2026 ecosystem announcements
  • Industry analysis surrounding Gemini and persistent agents

7. Agentic Infrastructure

Category: Agent Systems Engineering

Date of Appearance / Establishment:
Increasing prominence throughout early-to-mid 2026

Definition (Suggested Framing):
Agentic Infrastructure refers to the orchestration, memory, execution, coordination, observability, and governance layers supporting persistent autonomous AI agents.

Why It Matters:
The term reflects the rapid emergence of:

  • infrastructure stacks specifically optimized for autonomous agents,
    rather than conventional software applications.

It increasingly encompasses:

  • memory substrates,
  • orchestration middleware,
  • evaluation systems,
  • and persistent execution frameworks.

Why It Fits Aikipedia:
This may become one of the defining systems-engineering terms of late-2020s AI development.

Primary Sources:

  • Agentic engineering ecosystem discussions throughout 2026
  • Emerging AHE-related architectural literature

8. Continuous Diffusion Language Models

Category: Language Model Architectures

Date of Appearance / Establishment:
Broader emergence during May 2026 following ELF-related research

Definition (Suggested Framing):
Continuous Diffusion Language Models (Continuous DLMs) are language-generation architectures operating within continuous embedding spaces rather than strictly discrete token-generation frameworks.

Why It Matters:
The concept represents a possible alternative scaling trajectory to transformer-centric autoregressive language models.

The category includes:

  • diffusion-native language systems,
  • continuous embedding generation,
  • and flow-based linguistic architectures.

Why It Fits Aikipedia:
This would complement your published ELF entry while expanding outward into the broader architectural taxonomy emerging around continuous language generation.

Primary Sources:

  • ELF-related papers and associated diffusion-language-model literature

Highest-Priority Candidates for Immediate Publication

If selecting only a small number for the next issue, I would prioritize:

  1. Cognifold
  2. Cognitive Folding
  3. SIA
  4. RecursiveMAS
  5. AI Operating Environment

These five collectively capture the major May 2026 transitions involving:

and AI-native ecosystem evolution.

memory architectures,

recursive improvement,

multi-agent cognition,


Claude Sonnet 4.6 Max Thinking: Now I have everything I need. Let me compile the full inventory of what’s already been covered, verify all May 2026 concept candidates, and write the recommendations.I now have a complete picture. Let me compile everything — the inventory of what’s already been covered, the May 2026 candidate concepts, and the full Aikipedia-format recommendations list.


🤖 Claude’s Proposed Entries for Aikipedia: New AI Lexicon (May 2026)

Editorial note: The following recommendations exclude all concepts already covered in the Aikipedia 2025 series (Editions 1–5: MoE, RLAIF, World Models, Alignment Faking, KV-Cache Sharing, Compound AI Systems, Test-Time Compute Scaling, Mechanistic Interpretability, Constitutional AI 2.0, Chain-of-Verification, and others) and the three May 2026 entries already published: Agentic Harness Engineering (AHE), Recursive Self-Improvement (RSI), and Embedded Language Flows (ELF). All 12 candidates below emerged or crystallized as named concepts in May 2026.


12 Recommended Entries for Aikipedia: New AI Lexicon (May 2026)


🧮 1. AI Co-Mathematician

Definition: A new human-AI collaboration paradigm for mathematical research, in which an AI agent works interactively alongside a human mathematician — generating conjectures, exploring proof strategies, checking arguments, and iterating on failed attempts — without replacing the mathematician’s judgment. Characterized by the “mathematician-in-the-loop” workflow: the AI proposes, the human guides, and the collaboration solves problems neither could alone.

Why May 2026: Google DeepMind published the formal paper introducing the AI Co-Mathematician framework (arXiv:2605.06651) on May 7, 2026, the same week OpenAI demonstrated fully autonomous disproof of the Erdős unit distance conjecture. Together these two events established human-AI collaboration and full autonomy as the twin poles of a new field in AI-assisted mathematics.

Date of Appearance/Establishment: May 7, 2026 (formal paper); conceptual precursors include AlphaProof (2024) and AlphaEvolve (2025).

Primary Sources: Hu & He et al., “AI Co-Mathematician: Accelerating Mathematicians with Agentic AI,” arXiv:2605.06651, May 2026; Google DeepMind; Hugging Face Papers page.


💥 2. Intelligence Explosion (Operational Definition)

Definition: A positive feedback process in which AI systems contribute to the development of more capable successor systems, triggering accelerating capability growth that eventually escapes human-paced timelines. Distinct from the classical theoretical definition (I.J. Good, 1965), the operational 2026 definition refers to early, observable signs of AI accelerating its own research-and-development cycle — already documented in official company research agendas. Anthropic co-founder Jack Clark placed the probability above 60% that a full recursive self-training event will occur by the end of 2028.

Why May 2026: The term moved from AI safety theory into official corporate documentation in May 2026 when Anthropic’s Jack Clark publicly presented the Anthropic Institute research agenda citing “early signs” of recursive self-improvement, and delivered the 2026 Oxford Cosmos Lecture articulating a 60%+ probability threshold. This was the first time a sitting co-founder of a frontier AI lab placed the concept in a formal institutional document.

Date of Appearance/Establishment: Theoretical origin: I.J. Good, 1965; popularized by Vernor Vinge, 1993; entered operational AI company documents: May 2026.

Primary Sources: Jack Clark, 2026 Cosmos Lecture, Oxford University, May 20, 2026; Anthropic Institute research agenda (shared with Axios, May 7, 2026); The Guardian, May 21, 2026.


🕸️ 3. Agent Sprawl

Definition: The uncontrolled proliferation of autonomous AI agents across an organization — deployed incrementally across departments without centralized registry, governance, or oversight — creating management, security, accountability, and compliance risks analogous to the “shadow IT” crises of the cloud era, but with higher stakes due to agents’ capacity for autonomous decision-making and external action.

Why May 2026: With 96% of enterprises surveyed in early 2026 reporting use of AI agents but only 11% running them in full production, and 94% expressing concern about agent sprawl, the phenomenon reached naming consensus across SAP, Gartner, Serious Insights, and the EU AI Act compliance community simultaneously in May 2026.

Date of Appearance/Establishment: Named phenomenon: 2025–2026; reached critical mass in enterprise discourse: Q1–Q2 2026.

Primary Sources: Serious Insights State of AI 2026 April Update; SAP “AI in 2026: Five Defining Themes” (January 2026); EU AI Act compliance discussions; Spectrocloud Enterprise AI 2026 Trends.


💻 4. Computer Use (CU)

Definition: An AI capability allowing agents to visually perceive, interpret, and directly operate arbitrary graphical software interfaces — including legacy applications, web portals, and desktop software — using computer vision and sequential reasoning, without requiring API access, custom integrations, or process-specific scripting. Fundamentally different from traditional Robotic Process Automation (RPA) because agents adapt dynamically to UI changes rather than following brittle scripted click paths.

Why May 2026: Microsoft made Computer Use in Copilot Studio generally available across all commercial geographies on May 13, 2026 — the first enterprise-scale GA release of production-grade computer-use agents — turning a preview capability into a deployable enterprise automation layer and confirming the transition from chatbots to agents that “actually do the work.”

Date of Appearance/Establishment: Anthropic Computer Use preview: October 2024; OpenAI CUA: early 2025; Microsoft Copilot Studio preview: September 2025; GA: May 13, 2026.

Primary Sources: Microsoft Copilot Studio blog, May 13, 2026; Microsoft Copilot Studio “What’s New: May 2026”; TechHQ, DevOps.com, Windows Forum coverage.


⚖️ 5. Agentic Governance

Definition: The organizational, technical, and regulatory frameworks specifically designed to control, audit, and enforce accountability for autonomous AI agents operating in production — including agent registries, permission scoping, audit trails, kill-switch mechanisms, human approval checkpoints, and liability attribution. Distinct from traditional AI ethics (which governs model outputs) and from AI regulation (which governs model capabilities): agentic governance addresses the unique risks of systems that act in the world without per-action human review.

Why May 2026: As enterprises crossed the threshold of widespread agent deployment in early 2026, “agentic governance” emerged as a named priority distinct from general AI governance, formalized in enterprise architecture frameworks, EU AI Act enforcement guidance, and AI safety research agendas simultaneously in May 2026.

Date of Appearance/Establishment: Concept emerging 2025; named enterprise discipline: 2026; aligned with EU AI Act high-risk categories: 2026.

Primary Sources: SAP “AI in 2026: Five Defining Themes”; EU AI Act high-risk compliance frameworks; Spectrocloud Enterprise AI 2026; Microsoft Copilot Studio audit log documentation.


🏭 6. Sovereign AI Stack

Definition: A nation-state’s or large organization’s end-to-end vertical ownership of artificial intelligence infrastructure — encompassing compute (domestic data centers and chips), training data (national corpora in local languages and formats), foundation models (domestically trained), and application layers — operated independently of foreign providers and treated as critical national infrastructure equivalent to electricity or water grids.

Why May 2026: The concept crystallized from NVIDIA’s earlier “Sovereign AI” framing into a complete “stack” definition in May 2026, as governments across Southeast Asia, Europe, and the Middle East announced national AI infrastructure programs and NVIDIA’s partnerships with nation-states became a primary revenue driver. The May 2026 Spectrocloud enterprise AI analysis formally identified “Sovereign AI Stack” as a distinct category from generic cloud AI.

Date of Appearance/Establishment: “Sovereign AI” term: NVIDIA, 2023; “Sovereign AI Stack” as distinct vertical concept: 2025–2026; reached policy-implementation stage: May 2026.

Primary Sources: NVIDIA Sovereign AI initiative; Spectrocloud “Enterprise AI Trends 2026”; The Motley Fool analysis, May 16, 2026; Kaohoon International, May 2026.


🔬 7. AI Scientific Discovery Engine

Definition: A class of autonomous AI systems designed specifically to conduct open-ended scientific research — generating original hypotheses, designing and executing experiments (or formal mathematical constructions), verifying results against established knowledge, and producing peer-verifiable discoveries — across disciplines including mathematics, chemistry, biology, and physics. Distinct from AI-assisted research tools (which augment human scientists) and from general reasoning models (which answer questions): AI Scientific Discovery Engines pursue original knowledge creation as a primary goal.

Why May 2026: Two landmark events in May 2026 established this as a distinct AI category: OpenAI’s internal reasoning model autonomously disproved the Erdős unit distance conjecture (May 20), and Google’s “Towards Autonomous Mathematics Research” paper (Feng et al., 2026) described the Aletheia system, which resolved several Erdős problems with minimal human intervention. Together they marked the transition from AI that assists scientists to AI that functions as an independent scientific contributor.

Date of Appearance/Establishment: Precursors: AlphaFold (2020), AlphaGeometry (2024), AlphaEvolve (2025); “AI Scientific Discovery Engine” as a named category: May 2026.

Primary Sources: OpenAI Erdős unit distance announcement, May 20, 2026; Feng et al., “Towards Autonomous Mathematics Research,” arXiv, 2026; Google “AI Co-Mathematician,” arXiv:2605.06651; Tim Gowers, Noga Alon (external mathematical validation).


🔄 8. Post-Training Frontier

Definition: The emerging paradigm in which frontier AI improvements are driven primarily by post-training techniques — reinforcement learning from human or AI feedback (RLHF/RLAIF), instruction fine-tuning, synthetic data generation, reward model design, and preference optimization — rather than by scaling pre-training compute and data. As pre-training scaling returns diminish and data ceilings approach, the “post-training frontier” becomes the principal battleground for capability differentiation among frontier labs.

Why May 2026: As Google I/O 2026 demonstrated Gemini 3.5 Flash surpassing Gemini 3.1 Pro through targeted post-training rather than a larger pre-training run, and as multiple frontier labs publicly acknowledged data ceiling constraints, the “post-training frontier” crystallized as the dominant 2026 capability narrative — distinct from the test-time compute scaling paradigm (already in Aikipedia 2025 Ed.3) and from traditional training method entries.

Date of Appearance/Establishment: RLHF foundations: Stiennon et al., 2020; “Post-Training Frontier” as named paradigm shift: 2025–2026; dominant framing in industry discourse: May 2026.

Primary Sources: InfoWorld “6 AI Breakthroughs That Will Define 2026”; MIT Technology Review “10 Things That Matter in AI Right Now” (April 2026); Google I/O 2026 Gemini 3.5 Flash technical disclosure.


🖥️ 9. AI-Native Interface

Definition: Computing hardware or software environments designed from first principles around AI interaction — rather than adapting pre-AI form factors such as smartphones, laptops, or traditional GUIs to accommodate AI features. AI-native interfaces treat the AI as the primary operating layer, with input modalities (voice, gaze, gesture, continuous sensing), output forms, and interaction paradigms that have no legacy equivalent. Associated with “ambient computing” when the interface dissolves into the environment entirely.

Why May 2026: The OpenAI x Jony Ive “io” project letter (Sam Altman and Jony Ive, May 21, 2026) and Google’s debut of Android XR smart glasses at I/O 2026 (May 19–20) simultaneously marked the industry’s formalization of AI-native interface as a distinct product category, with multiple frontier AI companies pivoting from software-first to hardware-software-AI vertical integration.

Date of Appearance/Establishment: Ambient computing concept: Mark Weiser, Xerox PARC, 1991; AI-native interface as product category: 2025–2026; formal multi-company investment wave: May 2026.

Primary Sources: OpenAI, “A letter from Sam & Jony,” May 21, 2026; Google I/O 2026 Android XR announcements; WSJ coverage of Jony Ive and OpenAI; The Verge analysis.


🔍 10. Agentic Retrieval

Definition: A next-generation retrieval paradigm in which an AI agent autonomously plans, executes, evaluates, and iteratively refines multi-hop information searches across heterogeneous sources — rather than performing a single, statically configured retrieval pass as in classic Retrieval-Augmented Generation (RAG). Agentic retrieval systems decide what to search, assess whether results are sufficient, trigger follow-up queries, navigate knowledge graphs, and synthesize evidence across sources over multiple autonomous steps before returning a grounded response.

Why May 2026: As enterprise AI deployments scaled in 2026 and single-pass RAG systems proved inadequate for complex multi-step queries, agentic retrieval frameworks (AgenticRAG, LightRAG, GraphRAG with autonomous planning) became the dominant enterprise retrieval architecture. The formal distinction from classic RAG was codified in enterprise AI framework documentation and research papers in 2025–2026.

Date of Appearance/Establishment: RAG (Lewis et al., Facebook AI Research, 2020); multi-hop retrieval concepts: 2022–2023; Agentic Retrieval as distinct named paradigm: 2025–2026.

Primary Sources: AgenticRAG research framework documentation; LangChain Agentic RAG architecture docs; RAGFlow documentation; enterprise AI framework comparisons (2026).


🤝 11. Agent-to-Agent (A2A) Protocol

Definition: A standardized communication protocol enabling autonomous AI agents to directly exchange task specifications, context, interim results, and status signals with other agents — without a human or a monolithic orchestration layer mediating every interaction. A2A protocols define the “language” between agents: how one agent delegates a subtask to another, how results are returned, how errors propagate, and how trust is established between agents of different origins. Complementary to, but distinct from, the Model Context Protocol (MCP), which governs model-to-tool and model-to-context connections.

Why May 2026: Google’s A2A protocol and competing specifications matured from proposals to implementation standards in early 2026, and by May 2026 had become common foundational infrastructure alongside MCP in enterprise multi-agent deployments — mentioned alongside MCP as the two pillars of agentic AI interoperability in Spectrocloud’s May 2026 enterprise AI trend analysis.

Date of Appearance/Establishment: Google A2A proposal: early 2026; A2A as emerging enterprise standard: Q1–Q2 2026.

Primary Sources: Google A2A protocol documentation; Spectrocloud “Enterprise AI Trends 2026”; enterprise AI ecosystem comparisons, Q1–Q2 2026.


📊 12. Model-Based Evaluation (MBE)

Definition: The practice of using AI models as automated judges to evaluate the outputs of other AI systems — assessing semantic correctness, alignment with intent, stylistic adherence, safety compliance, and behavioral consistency — rather than relying on human annotation, syntactic rule-based checks, or benchmark leaderboard scores alone. Sometimes called “LLM-as-judge.” When applied to agent pipelines, MBE enables “semantic unit testing”: automated behavioral checks that run in the background to verify AI outputs meet specification without human review of every response.

Why May 2026: As agentic AI systems began operating at scales and speeds that made per-output human review impractical, Model-Based Evaluation became the primary quality assurance layer for production agentic systems in 2026. The term “semantic unit testing” was formally introduced as an engineering practice analogous to software unit testing, representing a maturation of MBE from research technique to production infrastructure.

Date of Appearance/Establishment: LLM-as-judge concepts: 2023–2024 (Zheng et al., “Judging LLM-as-a-Judge,” 2023); “Model-Based Evaluation” / “Semantic Unit Testing” as production engineering practice: 2025–2026.

Primary Sources: Zheng et al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena,” 2023; Medium, “From Generative to Agentic AI: A Roadmap in 2026” (Arash Nicoomanesh, 2026); Anthropic internal evaluation frameworks.


Why These 12?

These entries represent the dominant thematic clusters of May 2026 AI development:

Mathematical Breakthrough Cluster: AI Co-Mathematician, AI Scientific Discovery Engine — capturing the twin poles of human-AI collaboration and full autonomy in mathematics and science.

Safety & Governance Cluster: Intelligence Explosion (operational), Agent Sprawl, Agentic Governance — the governance reality that arrived with widespread agent deployment.

Infrastructure & Sovereignty Cluster: Sovereign AI Stack, Post-Training Frontier, Computer Use — the industrial and architectural shifts that define where competitive differentiation now lives.

Interface & Protocol Cluster: AI-Native Interface, Agent-to-Agent (A2A) Protocol — the form-factor and communication standards being laid for the agentic era.

Evaluation & Retrieval Cluster: Agentic Retrieval, Model-Based Evaluation — the quality assurance and knowledge access paradigms that make production agentic AI reliable.

None of the 12 duplicate any of the 23 entries already published across Aikipedia 2025 Editions 1–5 or the three May 2026 standalone entries (AHE, RSI, ELF).


Three Full Entries

Developed in full Aikipedia style for the three concepts most distinctively rooted in May 2026 events.


1. AI Co-Mathematician

Aikipedia Entry — May 2026

Definition

An AI Co-Mathematician is an interactive AI agent designed to work alongside human mathematicians as a collaborative research partner — not as an autonomous solver or a passive look-up tool, but as a thinking participant in the exploratory, iterative, and often failure-rich process of mathematical discovery. The AI Co-Mathematician generates candidate lemmas, explores proof strategies, checks sub-arguments for correctness, and adapts to human guidance throughout a research session, operating under a “mathematician-in-the-loop” model in which the human retains scientific judgment and strategic direction.

Date of Appearance/Establishment: Formal coinage: May 7, 2026 (arXiv:2605.06651, Google DeepMind). Conceptual antecedents include Lean-based proof assistants (2019–onward), AlphaProof (2024), and AlphaEvolve (2025).

Historical Context

Mathematical theorem proving by computer began with early symbolic systems in the 1960s (LISP-based provers) and formalized through interactive theorem provers such as Coq, Isabelle, and Lean 4 in the following decades. These systems required mathematicians to translate informal proofs into the formal language of the prover — a significant bottleneck that limited accessibility.

Deep learning brought a new generation of AI math systems. DeepMind’s AlphaGeometry (2024) solved International Mathematical Olympiad geometry problems by combining a language model with a symbolic engine. AlphaProof achieved silver-medal performance at the 2024 IMO using reinforcement learning and formal verification. AlphaEvolve (2025) used LLM-guided evolutionary search to rediscover and improve on best-known constructions across a broad problem corpus.

Despite this progress, all these systems operated autonomously without meaningful real-time interaction with human mathematicians. The AI Co-Mathematician framework, introduced by Google DeepMind researchers in May 2026, was designed specifically to address this gap — creating a stateful, interactive workspace where human mathematicians could direct, evaluate, and build upon AI-generated mathematical work in a continuous feedback loop.

Technical Description

The AI Co-Mathematician is implemented as an agentic system built on commercially available language models without custom model training. Its architecture includes:

Project Coordinator Agent: Manages the overall research session, decomposes high-level mathematical goals into sub-tasks, and delegates work across parallel workstreams.

Hypothesis Generation Module: Generates candidate conjectures, lemmas, and proof strategy outlines based on the problem context and the mathematician’s stated goals.

Verification Module: Checks generated arguments against existing mathematical knowledge (via integration with Mathlib and other formal libraries) and flags potential errors or gaps.

Iterative Refinement Loop: When proofs fail, the system analyzes the failure mode, generates alternative approaches, and proposes targeted repairs — rather than discarding failed attempts wholesale.

Human Interaction Interface: Provides a stateful workspace where mathematicians can redirect the AI’s attention, supply domain intuition, correct errors, and combine the AI’s generated strategies with their own insights.

A key architectural insight is that the system does not require the AI to be fully correct: the framework includes the case where a failed AI proof contains a “brilliant strategy” that a human mathematician can recognize and manually complete — as demonstrated in the collaboration with Marc Lackenby that resolved an open problem from the Kourovka Notebook.

Notable Systems and Papers (2026)

AI Co-Mathematician (arXiv:2605.06651, May 7, 2026): Google DeepMind’s formal introduction of the framework, with case studies including the Lackenby collaboration and open-problem resolutions.

Aletheia (Feng et al., “Towards Autonomous Mathematics Research,” 2026): A Google DeepMind autonomous mathematical research agent that resolved several open Erdős problems with minimal human intervention — operating closer to the autonomous end of the spectrum than the co-mathematician paradigm.

OpenAI Erdős Breakthrough (May 20, 2026): Demonstrated fully autonomous disproof of the Erdős unit distance conjecture, establishing the other pole of AI mathematical capability (pure autonomy vs. human collaboration).

The Agentic Researcher (Zimmer et al., arXiv:2603.15914, 2026): A complementary open-source framework for AI-assisted research in mathematics and machine learning, emphasizing autonomous 20-hour experimental sessions.

Applications

Open Conjecture Exploration: Systematic exploration of the space of open problems in combinatorics, number theory, topology, and algebra — domains where the search space is too vast for unaided human exploration.

Proof Verification at Scale: Automated checking of informal mathematical proofs submitted to journals, reducing referee workload and catching errors earlier in the publication pipeline.

Mathematical Education: Interactive tutoring systems where students work alongside an AI that offers hints, checks student-generated proofs, and models mathematical reasoning strategies.

Cross-Disciplinary Discovery: Identifying structural analogies between mathematical objects in different fields — a historically productive but labor-intensive research activity.

Limitations

No Domain Judgment: AI Co-Mathematicians cannot evaluate which open problems are mathematically significant, beautiful, or well-positioned — domain taste remains exclusively human.

Formal Bottleneck: Translation of AI-generated informal proofs into formally verifiable Lean or Coq code remains partially manual and error-prone.

Hallucination in Proof Steps: Language models can generate plausible-sounding but incorrect proof steps that require careful human verification, particularly in novel domains.

Context Length: Complex multi-session mathematical research may exceed practical context windows, requiring external memory systems that introduce their own reliability challenges.

Open Questions

Can AI Co-Mathematicians be trained to recognize which of their own proof strategies are promising versus dead ends, before the human mathematician has to evaluate them?

What is the optimal division of labor between AI hypothesis generation and human strategic direction across different mathematical subfields?

Can AI Co-Mathematician capabilities be extended from pure mathematics into experimental sciences, where “verifying” a hypothesis requires physical experiments rather than formal proofs?

Primary Sources: Hu et al., arXiv:2605.06651; Google DeepMind; Feng et al., “Towards Autonomous Mathematics Research”; Zimmer et al., arXiv:2603.15914; Medium (Data Science in Your Pocket), Hugging Face Papers.


2. Computer Use (CU)

Aikipedia Entry — May 2026

Definition

Computer Use refers to the AI capability — and the class of agents implementing it — to visually perceive graphical software interfaces, interpret their content and structure, and directly operate them through simulated or real keyboard and mouse actions, using computer vision and sequential natural-language reasoning. Computer Use agents can navigate websites, operate desktop applications, fill forms, extract information, and complete multi-step software workflows in any application a human can use, without requiring the application to expose an API, provide structured data feeds, or be modified in any way.

Date of Appearance/Establishment: Conceptual origin: screen-scraping and Robotic Process Automation (RPA), 1990s–2000s. AI-native computer use: Anthropic CUA preview, October 2024; OpenAI CUA: early 2025; Microsoft Copilot Studio preview: September 2025; first enterprise GA release: May 13, 2026 (Microsoft Copilot Studio, commercial Power Platform geographies).

Historical Context

The challenge of automating software interfaces predates modern AI. Robotic Process Automation (RPA) emerged in the 2000s as a way to automate repetitive UI-driven workflows — tools like UiPath, Automation Anywhere, and Blue Prism recorded human interactions with software and replayed them as scripts. RPA proved brittle: minor changes to a UI (a moved button, a renamed field, a new dialog) could break automations that required expensive manual maintenance.

A parallel tradition in AI — web scraping, optical character recognition, and desktop automation — attempted similar goals with more flexible techniques, but remained narrow and domain-specific.

The 2020s brought multimodal foundation models capable of seeing and describing screenshots with high accuracy. In October 2024, Anthropic released a preview of “computer use” capabilities for Claude, allowing the model to view screenshots and control computers via standard API calls — the first commercial AI system framed explicitly around operating arbitrary software as its primary use case. OpenAI followed with its Computer-Using Agent (CUA) in early 2025. Microsoft integrated both as foundation models for its Copilot Studio computer-use feature, which entered preview in September 2025 and reached general availability on May 13, 2026 across all commercial Power Platform geographies.

Technical Description

A Computer Use agent operates through a visually-grounded perception-action loop:

Screen Perception: The agent receives a screenshot of the current application state. Multimodal foundation models (such as Claude Sonnet 4.5 or OpenAI CUA) analyze the visual layout, extract relevant UI elements (buttons, fields, text, menus), and construct a semantic representation of what is visible.

Intent Grounding: The agent maps the current screen state against its current goal (specified in natural language), determining what action is required next — click a specific button, type into a field, scroll to find content, handle an unexpected dialog.

Action Execution: The agent issues low-level input commands (mouse click at specific coordinates, keyboard input, scroll events) to a virtual machine or real desktop environment. In Microsoft’s implementation, this runs on Windows 365 Cloud PCs for enterprise security isolation.

Adaptive Error Recovery: When the interface behaves unexpectedly (a popup appears, a page loads slowly, a field is greyed out), the agent perceives the new state and adapts its approach rather than failing — the key differentiator from brittle RPA scripts.

Audit and Governance Layer: Enterprise implementations (such as Microsoft Copilot Studio’s GA release) include Azure Key Vault credential storage, Microsoft Purview audit log propagation, Azure AI Content Safety filtering, and granular administrator controls — making the capability deployable under enterprise security standards.

Notable Systems (2026)

Anthropic CUA (October 2024 preview): First commercial AI system framed explicitly around computer use as a primary mode.

OpenAI CUA: Integrated into Microsoft Copilot Studio as one of the two production-supported foundation models (alongside Claude Sonnet 4.5) at GA.

Microsoft Copilot Studio Computer Use (GA: May 13, 2026): The first enterprise-grade general availability of production computer-use agents, supporting WinForms, WPF, UWP, WinUI, and Win32 application frameworks with full audit trail and governance integration. Named design partner: Graebel, which deployed a Service Order Agent to automate its proprietary Global Connect platform without any API modification.

Applications

Legacy System Automation: Automating workflows in mainframe-era or proprietary systems that have never exposed APIs and are too costly to redevelop.

Cross-Application Process Chains: Orchestrating multi-step processes that span three or more applications (e.g., extract from CRM → process in ERP → update in reporting portal) without building custom integrations for each pair.

Compliance and Audit Workflows: Navigating regulatory portals, filing systems, and government interfaces that lack machine-readable access.

Software Testing: Automated UI regression testing that adapts to interface changes rather than breaking with every new deployment.

Accessibility: Enabling users with motor or visual disabilities to operate software through natural language commands mediated by a computer-use agent.

Limitations

Unsupported Frameworks: Microsoft’s May 2026 GA release does not support Electron-based apps (VS Code, Slack, Teams desktop), Java-based applications (common in banking and healthcare), Unity apps, or Citrix virtualized environments — leaving large portions of the enterprise app estate unreachable.

Security and Trust: An agent with access to the screen, keyboard, and mouse of a production system has broad access to sensitive data. Credential leakage, prompt injection through screen content, and privilege escalation are active concerns.

Hallucination-Driven Errors: A misread UI element or incorrect visual interpretation can lead to consequential errors (wrong data entered, wrong action taken) in production workflows without human review of every step.

Not a Modernization Substitute: Computer use reduces the immediate pressure to modernize legacy systems, potentially entrenching technical debt rather than eliminating it.

Open Questions

Can Computer Use agents reliably operate across all graphical environments (including Electron, Java, and virtual desktops) without special-casing each framework?

What governance frameworks and liability standards should apply when a Computer Use agent causes data loss or unauthorized transactions?

How should Computer Use agents handle ambiguous UI states — for instance, when a dialog could mean two different things depending on context?

Primary Sources: Microsoft Copilot Studio blog, May 13, 2026; Microsoft “What’s New in Copilot Studio: May 2026”; TechHQ, DevOps.com, Windows Forum; Anthropic CUA documentation; Bravent.net technical analysis; DigitalApplied deep dive.


3. Intelligence Explosion (Operational Definition)

Aikipedia Entry — May 2026

Definition

An Intelligence Explosion, in its 2026 operational definition, refers to a positive feedback process in which AI systems make meaningful contributions to the design, training, or capability improvement of their own successors — triggering a cycle of accelerating AI capability growth that eventually exceeds the pace at which human researchers or institutions can guide, evaluate, or constrain it. Distinct from the classical theoretical definition (I.J. Good, 1965), the 2026 operational definition focuses on early, empirically observable signs of this process rather than its hypothetical endpoint, and treats the intelligence explosion not as a singularity event but as a threshold question with measurable probability.

Date of Appearance/Establishment: Theoretical origin: I.J. Good, “Speculations Concerning the First Ultraintelligent Machine,” 1965. Popularized: Vernor Vinge, “The Coming Technological Singularity,” 1993; Nick Bostrom, Superintelligence, 2014. Entered official AI company institutional documents: May 2026 (Anthropic Institute research agenda, Jack Clark’s Oxford Cosmos Lecture).

Historical Context

The concept of a positive feedback loop in machine intelligence is nearly as old as the field itself. I.J. Good, working from cryptanalysis experience at Bletchley Park, proposed in 1965 that an “ultraintelligent machine” capable of designing better machines would trigger an explosion of machine intelligence surpassing human control.

Through the 1990s and 2000s, the intelligence explosion was primarily discussed in speculative and philosophical contexts — Vernor Vinge’s technological singularity (1993), Eliezer Yudkowsky’s writings on recursive self-improvement (2000s), and Nick Bostrom’s Superintelligence (2014). The mainstream AI research community largely treated these discussions as premature or speculative.

The emergence of large language models capable of writing and improving code changed the calculus. By 2024–2025, AI systems could generate significant fractions of production software, assist in designing training pipelines, generate synthetic training data, and automate experimental evaluation. These capabilities represented limited, partial forms of the recursive loop that the intelligence explosion hypothesis had long described.

In May 2026, the concept crossed a threshold from theoretical safety discourse into official institutional documentation. Anthropic co-founder Jack Clark, in the Anthropic Institute research agenda (first shared with Axios, May 7, 2026) and in the 2026 Oxford Cosmos Lecture (May 20, 2026), stated that Anthropic was seeing “early signs” of AI contributing to speeding up the research and development of AI itself, placed the probability of a recursive self-training event above 60% by end of 2028, and identified the intelligence explosion as a named concern in official Anthropic research documents — the first time a sitting leader of a frontier AI lab used the term in that institutional context.

Technical Description

The intelligence explosion concept, in its 2026 operational framing, is analyzed along a spectrum of recursive capability enhancement:

Level 1 — Tool-Assisted Improvement: AI systems accelerate human-led AI development (code completion, automated testing, literature synthesis). Already widespread as of 2024–2025.

Level 2 — AI-Driven Post-Training: AI systems generate synthetic training data, evaluate model outputs, and optimize training hyperparameters for successor models with limited human oversight. Achieved in limited form by 2025.

Level 3 — AI-Driven Architecture Search: AI systems propose and test new model architectures, training objectives, or data curation strategies for their successors. Active research area in 2025–2026.

Level 4 — Full Recursive Self-Training: An AI system, given the instruction “make a better version of yourself,” executes the complete training pipeline — data generation, architectural design, training, evaluation — without human intervention at any stage. Jack Clark’s 60%+ probability threshold refers specifically to this level being achieved by end of 2028.

Level 5 — Runaway Recursive Loop: The rate of AI capability improvement begins to exceed the rate at which humans can evaluate, guide, or constrain it. This remains hypothetical as of May 2026.

The intelligence explosion concern is tightly coupled with the Recursive Self-Improvement (RSI) entry already in Aikipedia 2026 (May 14 edition), but differs in scope: RSI describes the mechanism; the intelligence explosion describes the potential systemic outcome of RSI proceeding without effective brakes.

Notable Developments (May 2026)

Anthropic Institute Research Agenda (May 7, 2026): Official Anthropic document citing “early signs of AI contributing to speeding up the research and development of AI itself” — the first formal institutional acknowledgment of observable recursive improvement.

Jack Clark’s 2026 Oxford Cosmos Lecture (May 20, 2026): Delivered at the Schwarzman Centre for the Humanities, Oxford. Clark announced a 60%+ probability of an intelligence explosion-class recursive self-training event by end of 2028 and stated the risk “hasn’t gone away” for existential outcomes.

OpenAI Erdős Breakthrough (May 20, 2026): A general-purpose reasoning model autonomously disproving an 80-year-old mathematical conjecture without specific training provided a concrete data point for assessing autonomous AI reasoning capabilities — one input to forecasts about Level 4 recursive capability.

Applications of the Concept

AI Safety Research Prioritization: Provides a framework for identifying which forms of recursive improvement capability require the most urgent safety research — particularly interpretability of training pipelines, reward model alignment, and architectural self-modification.

Regulatory Timeline Setting: Governments using the intelligence explosion framework to set compliance timelines (e.g., EU AI Act) that account for non-linear capability growth rather than assuming linear extrapolation.

Governance Trigger Point Definition: Organizations defining internal thresholds at which AI involvement in AI R&D triggers mandatory human review, independent auditing, or capability pauses.

Investment and Strategic Planning: Frontier AI labs and their investors calibrating capital deployment, safety infrastructure investment, and compute buildout against non-linear capability scenarios.

Limitations

Definitional Ambiguity: The boundary between “AI meaningfully contributing to AI development” (already happening) and “intelligence explosion in progress” is not precisely defined, making it difficult to know whether or when Level 4 has been reached.

Forecasting Uncertainty: The 60%+ probability figure represents one expert’s estimate, not a formal probabilistic analysis, and is sensitive to assumptions about what counts as “fully training its successor.”

Risk of Premature Alarm: Repeated predictions of imminent intelligence explosion (going back to 1965) that did not materialize may cause institutions to discount current warnings.

Missing Countervailing Forces: Standard intelligence explosion models underweight the potential for energy constraints, hardware limits, regulatory intervention, safety mechanisms, and economic bottlenecks to slow or halt recursive improvement cycles.

Open Questions

At what measurable level of AI contribution to AI R&D should regulatory or organizational pause mechanisms be triggered?

Can interpretability research make the recursive improvement loop observable enough that safety evaluations can keep pace with capability improvements?

What institutional structures (international treaties, independent audit bodies, staged deployment requirements) could reduce intelligence explosion risk without preventing beneficial AI development?

Is there an empirical way to distinguish a controlled recursive improvement trajectory from an early intelligence explosion?

Primary Sources: I.J. Good (1965); Vernor Vinge (1993); Nick Bostrom, Superintelligence (2014); Jack Clark, 2026 Oxford Cosmos Lecture, May 20, 2026; Anthropic Institute Research Agenda (Axios, May 7, 2026); The Guardian, May 21, 2026; Resultsense, Let’s Data Science, AI Weekly.


AI-generated content may contain errors. Please see Champaign Magazine’s Disclaimer on AI-generated content.


This completes Claude’s recommended 12-entry list and 3 fully developed entries for Aikipedia: New AI Lexicon (May 2026). All 12 avoid duplication with the 23 previously published Aikipedia entries and the 3 standalone May 2026 entries (AHE, RSI, ELF). The three full entries — AI Co-Mathematician, Computer Use, and Intelligence Explosion — were selected for their direct anchoring in specific, verifiable May 2026 events, their cross-domain significance, and their editorial contrast: one scientific collaboration paradigm, one enterprise production capability, and one safety-critical theoretical concept that turned operational.




Leave a comment