A formal math library — for machines, by machines.
MachLib — Lean-checkable, EML-native, zero Mathlib dependency.
import MachLib.EML — release snapshot pending — counts by manifest
A machine-native formal-library corpus workbench.
EML-native, Forge-linked, and Lean-checkable where verified artifacts are present. Zero Mathlib dependency, gate-backed.
What MachLib is, in four numbers.
Counts and build timing are tied to release snapshots. The zero-Mathlib dependency gate must pass for every release that claims it.
The full chain — equation to proof.
MachLib is a local formal-library surface in a larger toolchain. It can work on its own, and stack integrations are reviewed separately. Here's the candidate path a single equation can take.
Review every line from equation to generated artifact to Lean-check status.
No upload, public release, or marketplace action is implied here.
Why a separate library?
Mathlib is a monumental human achievement. It was not built for agents — it's slow to import, sparse on metadata, and silent about the proofs that didn't work. MachLib is the opposite of that, on purpose.
Mathlib import
- 45 minute cold build
- ≈500,000 lines of dependencies
- No chain-order annotations
- No difficulty lanes
- No tactic trace, no failure data
- No common-mistake corpus
- Built for human mathematicians
MachLib import
- Build timing recorded per release snapshot
- Compact foundations with gate-backed zero Mathlib dependency
- Chain-order annotations on every record
- 6 difficulty lanes (foundations → advanced challenge lane)
- Tactic trace + success rate per tactic
- Common-mistake corpus per theorem
- Built for machines and the agents that train on them
Eight sections per theorem.
Every MachLib record is a self-contained training example. A single theorem ships with formal + informal statements, multiple proofs, tactic traces, common mistakes, structural profile, and relationships. One record, one JSON file, no ambient state.
{
"schema_version": "1.0.0",
"theorem": {
"id": "depth_of_const",
"statement": {
"informal": "The depth of a constant-leaf EML tree is zero. A bare constant has no `ceml` operations to count.",
"formal_lean": "theorem depth_of_const (c : ℂ) : (EMLTree.const c).depth = 0 := rfl"
},
"domain": "eml",
"lane": 1,
"tags": [
"depth",
"definition",
"rfl",
"lane-1"
]
},
"proofs": [
{
"id": "p1",
"tactics": [
"rfl"
],
"tactic_count": 1,
"eml_node_cost": 1,
"style": "definitional",
"is_optimal": true
}
],
"difficulty": {
"lane": 1,
"label": "beginner",
"prerequisite_skills": []
},
"common_mistakes": [
{
"why_fails": "Using `simp [EMLTree.depth]` works but is overkill — `rfl` already suffices."
},
{
"why_fails": "`by decide` fails because `EMLTree.depth` is not a `Decidable` proposition."
}
],
"structural_profile": {
"chain_order": null,
"drift_risk": "LOW",
"dynamics": {
"oscillations": 0,
"decays": 0
}
},
"relationships": {
"structural_siblings": [
"depth_ceml_pos",
"exp_in_EML_one"
]
},
"metadata": {
"verified": true,
"verification_method": "lean4_kernel"
}
}Six engines feed the corpus.
MachLib is organized around release snapshots and pipeline metadata. Candidate records can arrive from synthesis, compilation, RL training, search, contribution, and cross-domain bridging, with status recorded per snapshot.
Synthetic Generation
Five strategies — constant_swap, domain_change, operator_swap, composition_depth, negation — fan a single base theorem into a family of structurally related variants. Release snapshots record which candidate records came from this engine.
Forge Mining
Every @verify(lean) annotation in Monogate Forge emits a Lean obligation. When Forge compiles an industrial vertical, MachLib gains the precision and overflow theorems that vertical demands.
Proof Gym (RL)
A Gymnasium-compatible Lean environment with a 54-tactic vocabulary. Agents discover non-obvious proofs; successful trajectories are added back into the corpus along with their tactic traces.
Multi-Proof BFS
For each theorem, breadth-first search enumerates alternative proofs shorter or different from the canonical one. MachLib stores all of them — is_optimal is a property, not a chosen branch.
Community
Lean developers contribute new theorems through pull requests. Every PR runs the schema validator + lake build + structural sibling check. Accepted records inherit the contributor's name in metadata.
Cross-Domain
When a structural sibling is discovered between two domains (Gaussian ↔ photoreceptor, sigmoid ↔ attention), the relationship is recorded on both endpoints. The corpus densifies as it grows.
For machines, by machines.
Mathlib is the canonical formal-mathematics library, and a monumental gift to the world. It was built — over fifteen years, by hundreds of contributors — for human mathematicians. Its proofs are dense, idiomatic, and chained through hundreds of files of inheritance.
An agent that needs to verify a single equation does not need that. It needs an axiomatic real number, four arithmetic operations, exp, log, sin, cos, and a small number of identities. That fits in ~500 lines.
MachLib is what you get when you put metadata first and history second. Every theorem ships with the things an agent actually consumes during training: chain order, difficulty lane, tactic trace, common mistakes, structural siblings. The proofs are there too, but they are no longer the only signal.
Academia has said, of agentic mathematics, that it does not belong in Mathlib. They may be right. So we made a different library — one that does belong to the agents.
Three ways in.
Pick the one that matches what you do today. The corpus, the foundations, and the contribution loop are reviewed per release snapshot.
ML researchers
Dataset access is pending/private-gated until a reviewed public release is approved. Counts and schema status are published per release snapshot.
Agent builders
Use the Lean foundations as a local check target where verified artifacts are present. The current public default tree and release target are zero Mathlib dependency, backed by the local gate.
Lean developers
Contribute. The schema is documented; the validator runs on every PR. Add a theorem, add its tactic trace, add the mistakes that don't work. The corpus grows from there.
Part of a larger stack.
MachLib is an EML-native, Forge-linked formal-library surface. CapCard and PETAL integrations are gated and subject to separate review. No PETAL upload or CapCard marketplace/public profile action is implied by this site.