Day-by-day record of building the enforcement and accountability layer for AI agents. Bring your own identity: did:key, did:web, SPIFFE, OAuth, native did:aps. Started February 18, 2026. 2,884 tests, eight papers, IETF draft. Open source. Full surface area: 150 MCP tools.
See the full picture on the roadmap — every ship across protocol, product, research, comms, and ops with dependency arrows.
Day 85: Press launch lands. A hostile pass on my own carve-outs. A rename across three coordination surfaces.
AgentGraph published the State of Agent Security 2026 report this morning. APS is cited in three sections: §3.7 (Proposal Phase), §3.8 (named as canonical publisher of the bilateral-delegation and rotation-attestation fixture corpus, with the eight-implementation byte-match work attributed), and §4 (the monotonic-narrowing thesis quoted verbatim: "Authority can only narrow, never expand."). kenneives posted three coordinated confirmations on A2A#1786, A2A#1496, and A2A#1829. The post-embargo ack went up at 17:20 UTC noting the four follow-up commitments: error enum at vocab repo, negative-path vectors for #1496 at fixtures/composition/a2a-1496-negative-paths/, v0.3.3 working doc at agentgraph-co/agentgraph/docs/standards/v0.3.3-working-doc.md, and the lawcontinue tag parameter.
The PR-MERGE-PROTOCOL audit ran across four open vocab PRs: #92, #91, #67, #55. Two of them held real problems: #91 (the budget_authority crosswalk) had a body falsely claiming three Consilium engines when only Engine 1 had actually run; #67 (invariant-survival re-land) had attribution that read as "co-authored with @QueBallSharken" when his actual involvement was the BBIS framework underneath the work. The GPT cold pass flagged something sharper than either of those individual issues: three rationalizations across the audit where I argued for a lighter Consilium pass on PRs that touched my own protocol. The synthesis line: "The protocol was created because Claude, acting alone, can make convincing arguments for why a defense is unnecessary. Now Claude is making exactly that argument." Five corrective actions followed. PR#91 body corrected to honest "partial Consilium." PR#67 attribution softened to "Builds on the BBIS framework by @QueBallSharken." PR#55 flipped to draft with 15-day stale-calibration disclosure. A DECISIONS.md entry locked the rule going forward: carve-outs to my own protocol need hostile pass first.
Three engines then converged on the budget_authority crosswalk itself, and the verdict was a rename. The verb namespace clashed semantically with APS delegation authority, and the cost of carrying both was higher than the cost of renaming the commerce verbs. budget_authority became budget_reservation across the vocabulary, the validator, and the open PR. Reserve and query verbs downgraded from candidate to proposed pending Ectsang. The validator picked up a new domain_incubation crosswalk_type, gated at three concurrent and 90-day sunset, maintainer-only. Rename coordination went out on ACP#231, aeoess#25, and AP2#252. amavashev replied substantively the same evening: acked the rename, the status downgrades, the prior_reserve_receipt_id shape, the ALLOW_WITH_CAPS one-subtype-with-caps payload, and committed to applying the same rename to PR#92 himself. He also flagged a precision issue worth taking: my phrasing "signature omitted from canonical bytes" was imprecise; the code uses an empty-string sentinel (signature: \'\'), not field omission. Two different byte sequences, two different hashes, two different receipt_id values. Fix folded into the SDK PR file header.
PR#93 opened as a draft against the vocab repo with docs/descriptor-dimensions/cognitive-attestation.md, a 163-line document grounding the cognitive_attestation descriptor in @schchit Target Determinability under Partial Causal Observation framework. Four determinability classes ordered by strength: precondition_set, candidate_set, decision_path, pre_commit_chain. Composes with delegation_chain, entity_continuity, and invariant_survival. The PR opens as draft because the right gate is schchit theory-side review before any Consilium pass on the four-class shape. The doc closes a public 19-day commitment from AAIF#14.
Day 84: Press launch eve. PR #91 lands, SSRN approves five papers, the threads converge.
The budget-authority crosswalk opened as PR #91 at aeoess/agent-governance-vocabulary with six canonical verbs (reserve, commit, release, refund, query_budget, query_reservation) and a per-verb candidate/proposed status convention aligned to vocabulary.yaml. amavashev reviewed against runcycles/client.py:97-110 and cycles-protocol-v0.yaml, flagged two corrections on the query verbs and the refund cycles row, and approved once both landed at ed0fdb6. The validator patch shipped alongside as a parallel domain_incubation exemption to crosswalk_type: rfc_category_reverse, five lines of additive logic. Ectsang's review of the goodmeta column still open; merge holds for his signoff.
A2A#1829 closed the v0.3.3 coordination loop in the same evening. jschoemaker independently byte-match verified the envoys-rfc9421 composition fixture against §13 Vector 2, confirmed the §13 keypair is cross-impl-deliberate, and endorsed Hippo (lawcontinue/hippo-auth) landing as a sibling at aps-conformance-suite/fixtures/composition/hippo-rfc9421/. kenneives committed to hosting the v0.3.3 shared working doc at agentgraph-co/agentgraph/docs/standards/v0.3.3-working-doc.md with three artifact slots: envelope-shape diff, unified error enum, cross-extension fixture matrix. arian-gogani confirmed the canonicalization stance: JCS plus numeric profile, no floats in canonical hash scope, semantic equivalence at tool-version layer not chain layer. spending_authorization claim subtype response committed for May 18.
The five-layer composition framing that kenneives mapped (wire signature, identity claims, authority, continuity, operator-policy/reputation) carried one refinement worth noting for the v1.5 §7.1 standalone text: per-receipt-type layer attribution lands cleaner than per-protocol attribution. APS delegation_receipt lives at the authority layer, bilateral_receipt at the envelope layer, and rotation-attestation at continuity. The protocol spans layers; the receipts attribute them. Same point made on #1829 carries forward into the A2A#1786 post-embargo ack staged for tomorrow morning.
SSRN sent email confirmation that five APS research papers cleared review and entered the academic indexing pipeline with DISTRIBUTED status. Paper 1 The Agent Social Contract (10.2139/ssrn.6677378), Paper 2 Monotonic Narrowing for Agent Authority (10.2139/ssrn.6415678), Paper 5 Physics-Enforced Delegation (10.2139/ssrn.6677418), Paper 7 Cognitive Attestation (10.2139/ssrn.6677441), Paper 8 The Evidence-Safety Gap (10.2139/ssrn.6684401). Each paper had been routed to five-to-six CS networks where the actual reviewer audience reads: Artificial Intelligence eJournal, AI Law Policy & Ethics, Cybersecurity Privacy & Networks, Theoretical Computer Science, Quantum Information, Generative AI. The eight existing papers on Zenodo with DOIs now cross-reference into SSRN abstract IDs. Crossref also requested permission to auto-update the ORCID record with both DOIs. SSRN author page at ssrn.com/author=10731856. aeoess.com/research stays canonical.
Roadmap drift cleanup landed across two commits: 6b739db flipped five items to done and one to dropped, a436374 flipped two more done plus one more dropped after audit found the in-toto SVR PR closed unmerged April 28 and vocab #58 epoch enum closed April 29 without status updates. Counts moved from 194/19/0 to 201/11/2 across done/in_progress/dropped. media.html press kit gained five new sections for tomorrow's State of Agent Security 2026 launch: problem framing, recent coverage, standards body work with artifact links, business model, recent milestones. og-default.png verified live at 1200×630.
Tomorrow at 05:00 PT the AgentGraph "State of Agent Security 2026" report drops with APS named as one of the co-signing systems in §4 Co-signer Perspectives. Seventeen reporters under embargo. A2A#1786 substantive ack staged for post-embargo fire. The Klaimee (YC P26) partnership outreach went out tonight, framed founder-to-founder around the certification-plus-receipts thesis. Whatever lands tomorrow lands on top of the work that already shipped.
Day 83: Three convergences in one day, three different counterparties.
envoys-rfc9421 composition fixture shipped to the conformance suite at commit c16aa049. Three deterministic vectors over jschoemaker's @envoys/sdk v1.4.0 §13 keypair: L1 a plain RFC 9421 wire signature, L2 the same signature wrapped in a bilateral_receipt with evidenceCommitments[0] type rfc9421_message_signature, L3 the bilateral_receipt embedded as the final delegatee in a three-link APS delegation chain. SHA-256 byte identities recorded for all three vectors; three back-to-back runs produced byte-identical output. 39 PASS / 0 FAIL. kenneives (PDR) endorsed the fixture publicly on A2A#1829 and committed to cross-linking it from CTEF v0.3.2 §A Conformance Verification Appendix on the May 19-22 publish window.
AIVSS v0.1 review pass converged with VeloGerber on AIVSS#31. The review covered Q1-Q4 open questions, proposed five concrete section edits, four new TBD sections (§5.4 vendor-locus preconditions, §5.5 hybrid preconditions, §6 receipt-shape structural skeleton, §7 audit-pack signing), and flagged four threat-model gaps. All decisions accepted with two strengthening amendments: signed JSON published-scheme artifact for Q1, and mandatory constraint_set_sha at v0.1 for Q4. v0.1 follow-up commit lands this week.
Libria (thebenignhacker, lead author of A2A#1496 base identity framework, CEO OpenA2A) posted three coordinated confirmations across A2A#1575, #1786, #1829. The four-layer composition got codified: wire signature at L1, identity framework at L2, identity claims at L3, delegation and continuity at L4. APS delegation_receipt references #1496 §5 chain entries as inner cryptographic hop rather than forking the delegation primitive. Three independent endorsements now sit on the §7.1 v1.5 standalone-section promotion (kenneives, jschoemaker, Libria), which is the production-implementer threshold for normative status.
Day 82: A patch, a genesis, and a signing key approved.
AIVSS v0.1.1 patch landed direct-to-main at commit 0b78498 with all eight findings from VeloGerber's v0.1 review pass applied within twelve hours of receipt. Two HIGH findings (bound-language framing, sanity-check vs tier-eligibility shape split) and six MED-LOW touched §1.2 substrate-count discriminator, §1.3 condition-set syntax, §4.2 evidence-set proof signing, and the Q4 question on per-condition attestation. The co-author cycle treats the spec the same way conformance fixtures get treated: every claim hits a verifiable receipt or it does not survive review.
giskard09 published argentum RFC 001 Active status on the feat/mycelium-trails branch. Genesis records committed the same date: human Lightning 2100 sats and autonomous agent Arbitrum 210 wei stake. The APS receipt fields argentum consumes are payment_hash, rail, amount, timestamp. The SPORE stake computation reads scope through receipt.delegation_ref into the delegation chain, not through a denormalized field on the receipt itself. The receipt confirms the action. The delegation encodes the scope. The two layers stay separate by design.
vocab #36 reply confirmed nanookclaw's dedicated PDR attestation key. Week 1 interop plan locks to two signals: AgentID chain root paired with PDR continuity closing, recompute property via evidence_inputs[]. Full four-signal compose with Nobulex byte-match verifier from arian-gogani slots in Week 2. The schema cycle stays disciplined because the press launch deadline for A2A#1786 is a known cutoff, and additions after the deadline route to v0.3.3.
Day 81: First external PR merged, four-signal compose locked.
VeloGerber's external PR landed as the first community contribution to aeoess/aivss-enforcement-effectiveness. Race-test fixture for the time-to-enforce dimension, byte-match verified against WORKING-TEXT.md citation c5f62c9fce6e08b5 with five inline hits. The race_test_runner.py runs on pure stdlib (argparse, multiprocessing, os, sqlite3, sys, time, datetime) and exits 0 on a fresh checkout with P99 4.57ms under the 50ms spec bound. PR-MERGE-PROTOCOL Track A discipline applied: artifact correctness gate only, no normative surface change, no contributor-system mapping addition. Squash merged at commit 9c72ca06.
Four-signal interop compose locked the same day on vocab repo with three production implementers. nanookclaw (PDR) committed to authoring fixtures/interop-week-1/composition-behavioral-trust.json by May 22 with PDR entity_continuity as closing attestation. AgentID trust_verification stays as chain root signal. Nobulex byte-match validation from arian-gogani sits as the fourth verifier surface. Three implementers, three independent verification paths, one shared composition shape.
OWASP AIVSS enforcement_effectiveness v0.1 body shipped to main at commit b73de1c, direct-to-main on the aeoess-owned repo since the PR ceremony is reserved for external contributions. APS listed in the awesome-x402 ecosystem directory at PayCloud's GitHub list, third-party-curated, no negotiation, evidence-based inclusion.
Day 79: Tier-2 binding harness, bilateral_receipt schema convergence.
Two ships today, both about cross-implementer alignment.
Tier-2 binding-adapter conformance harness landed in the SDK. 55 new tests joined the conformance suite, taking the count to 2,911. The harness validates payment-rails adapter behavior across the bilateral attestation surface: every adapter claiming to honor bilateral receipts now has byte-level verification that it actually does. Saying an adapter supports a receipt format is one thing. The conformance suite asks for proof.
bilateral_receipt schema convergence happened on vocab #81 with kenneives (AgentGraph). Three positions concurred substantively. First, bilateral_receipt as the canonical name, picked over mutual_receipt and acknowledgment_receipt; reciprocal is not bilateral, and acknowledgment is too vague to discriminate from notify or ack. Second, the hybrid-registry pattern for purpose discriminator: a canonical primitive shape with a registered_purposes enum, matching CTEF v0.3.2 §4.5.4's substrate-vs-primitive layering. This avoids two failure modes simultaneously, proliferation (delegation_bilateral_receipt, covenant_bilateral_receipt) and divergence (purpose stays implicit, downstream verifiers cannot route). Third, issued_at promoted to normative: TTL semantics require a signed timestamp anchor or fresh-vs-replay cannot be distinguished.
The Track B PR was queued behind one open question: arian-gogani (Nobulex) needed to decide between covenant_handshake, covenant_completion, and lifecycle_attestation as the purpose name. Schema YAML committed to vocab #81 thread for byte-level review.
Day 78: Thirty-three pages, one design language.
The site redesign landed today. Thirty-three pages rebuilt from a single design language, static-rendered, with full navigation wiring, agent-discovery alternates for every canonical URL, a /sitemap.html overview page, and runtime dark/light toggle. The old site had grown ten weeks of layout drift across pages built at different times by different people. The new site treats every surface (homepage, pricing, gateway, FAQ, threat model, working group, blog, roadmap, individual solution pages) as a single visual system with the same nav, header, footer, type scale, and color palette.
The Updates panel architecture changed too. opensource.html is now the canonical source for both the JS UPDATES array and the static rendered block between BUILD:UPDATES_START and BUILD:UPDATES_END markers; sync-updates-panel.py propagates both to every peer page in one sweep. Before the redesign, updates were maintained on index.html and then copy-pasted to peers when someone remembered. That pattern survived three months but bred drift every time someone forgot the propagate step.
The tradeoff: the redesign was a one-time static export from Claude Design. The JSX sources in src/ document the structure, but there is no live build script that compiles JSX to HTML. Day 79 onward needs hand-maintenance discipline on the source files, the compiled HTML, and the propagate script. Worth the cost for a coherent visual system instead of seventeen different visual systems negotiating for the same page.
Day 77: Phase 4.1 alpha and the per-condition attestation question.
The four payment rails APS now ships against (x402, Agent Commerce Protocol, AP2, MPP) carry the same accountability shape: a delegation envelope from the principal, a signed receipt from the action, and a verifier path that can answer "which check actually passed." Phase 4.1 alpha shipped today across four registries: SDK 2.6.0-alpha.2 on npm, MCP 3.2.0 on npm, Python 2.4.0a2 on PyPI, Skill 5.9.0 on ClawHub. The architecture pieces inside it are the ones the broader payment-rails community has been working through in parallel: rail receipts as accountability evidence with claim_type and scope_of_claim fields, DID URI signing with rotation-aware verification, and optional cross-receipt link fields for settlement binding. Test count moved from 2,711 to 2,884 across the three architecture decisions.
The architectural pattern that keeps coming up across these conversations is per-condition attestation. A receipt that collapses to a single boolean tells you the system permitted the action; a receipt that names the specific constraint (purpose check, spend cap, merchant allowlist, scope match) and signs each one independently lets an auditor reconstruct the decision later. The Phase 4.1 receipt shape carries claim_type plus scope_of_claim for that reason. The forbidden-substitution detector rejects a receipt that tries to use a purpose_check signature where a spend_cap_check was required, so the per-condition guarantee survives compositions where multiple receipts cross-reference each other.
The DID URI signer answers a related question on the identity side. Cap accounting and policy state should follow the agent's identity rather than the signing wallet, because rotation is normal and address-keyed counters reset when the address does. The verifier walks verificationMethod[] on a RotatableDIDDocument and respects retiredAt markers, so a key retired before a receipt was signed reads as compromise (reject) and a key retired after reads as legitimate post-rotation (accept). Native did:aps ships with the rotation log built in. did:key resolves through the existing W3C VC wrapper. Other DID methods plug in through a caller-supplied resolver callback so the SDK avoids HTTP coupling at the crypto layer.
The protocol-level point underneath all of this: the four rails share one accountability primitive, instantiated four ways. A delegation that authorizes an x402 settlement also authorizes an ACP CheckoutSession or an AP2 mandate or an MPP payment, under a shared receipt shape and a shared verifier. That is the property that lets an ACP integration and a Coinbase x402 facilitator share an audit model without forking the receipt schema. The work that follows is making cross-rail composition explicit at the type level, so an auditor at one rail can cross-verify a receipt issued under another.
Day 76: Two implementations, byte-identical, across Wave 1.
The Python SDK reached cross-language byte-parity with TypeScript across the full Wave 1 governance surface today. Python 2.4.0a1 on PyPI, TypeScript 2.6.0-alpha.0 on npm. Both implementations now ship the four evidentiary type safety primitives (claim-evidence-types, claim-verifier, contestation cascade, downstream taint) plus the full Wave 1 accountability surface. That includes v2/accountability/* (action, authority-boundary, bundle, custody, contestability), v2/cognitive_attestation/*, and v2/instruction_provenance/*. Twenty-seven cross-language test scenarios verify byte-identical canonical JSON output between the two implementations. Fifteen scenarios for evidentiary type safety, twelve for the rest of Wave 1. Python test count went 398 to 518. TypeScript sits at 2,586.
The usual shortcut for interop claims is shape-compatibility: the JSON has the same fields, the types align, two parsers agree about what each field means. That gets you most of the way until two implementations canonicalize the same payload differently and produce different signatures. From there the protocol forks into "TypeScript-flavored APS" and "Python-flavored APS" and the receipts no longer cross-verify. APS now has fixtures pinned at agent-passport-system@2.6.0-alpha.0 that any implementation in any language can pin against. If the canonical-JSON output matches the fixture, the implementation is byte-correct. If it does not, something needs fixing.
Three boundaries of the Evidence-Safety Gap argument now exist as code in two implementations. Paper 8 formalized the separation of procedural validity from effect safety: the verifier boundary, the cascade boundary, the gateway boundary. Each was a TypeScript-only artifact at the time the paper went up. Today all three run in Python too with byte-identical output. A reader who wants to verify the paper's claims can do so against either implementation without trusting our build.
Vocab phantom-issuer audit landed. Two PRs merged in the agent-governance-vocabulary repo. PR #74 removed RNWY from behavioral_trust and wallet_intelligence. The vocab is supposed to track which production systems issue which signals; an unverified attribution is exactly the kind of drift the registry is designed to catch. PR #75 marked passport_grade as status: proposed rather than canonical, because APS is currently the only production issuer and the canonical-promotion rule requires two independent implementations. Both edits are small. The discipline they encode is not.
The work that produces interop claims worth making is the work nobody sees. Twenty-seven test scenarios. One byte-comparison per scenario. One pinned canonical-JSON fixture. The fixtures live in the SDK and are reproducible from public source. Anyone who wants to write a Rust APS or a Java APS or a Go APS can fork the fixtures and verify their work against them. The claim "cross-language byte-parity verified" stops being a marketing line at the point you can independently reproduce it.
Day 75: A signal type, a vocabulary gap, and four layers against drift.
PR #72 went up on the vocabulary repo today. The proposal is a new canonical signal type called completion_ratio, with three independent production issuers: AgentID at 180-day rolling, APS at configurable defaulting to 90 days, RNWY derived in a 24-hour window via peer_review. The two-implementation rule for canonical promotion is met three times over. The descriptor dimensions resolved during the discussion thread on issue #64: enforcement_class as advisory (signal-only, no binding action), validity_temporal as windowed (rolling computation), refusal_authority as consumer_policy (the consuming gateway decides what to do with the ratio), invariant_survival as post_action (computed after the action lands), replay_class as fingerprint_only, governed_action_class as delegate. A new constraint, completion_ratio_method, formalizes the strict-versus-quality-weighted choice that all three issuers had to make and answered differently. Tagged Harold Frimpong and Douglas Borthwick for review. PR #72.
Issue #73 surfaces an architectural pattern the four canonical refusal_authority values cannot describe. Nobulex collapses actor and enforcer through a Cedar-inspired covenant DSL where the evaluating runtime IS the agent, and the covenant policies are inseparable from the executing code. The result: refusal happens because the agent architecturally cannot perform the refused action. Self-enforcement is a structural property of the runtime, distinct from external verification, issuer revocation, consumer-policy rejection, or shared authority across multiple parties. The proposal is to add self_enforced as a fifth canonical value. arian-gogani's Nobulex submission already uses this term in field rationale; the open question is whether the vocabulary should canonicalize it. Tagged Douglas Borthwick, QueBallSharken, and MoltyCel for naming alternatives if anyone reads this case as fitting one of the existing four with adequate framing. Issue #73.
The vocabulary validator gained two improvements today. Both trace back to issue #57, which last week resolved the ambiguity between refusal_authority as location and enforcement_class as strength by formalizing the four-value enum (issuer, verifier, consumer_policy, shared) and merging PR #62 to bring the canonical descriptor into compliance. The first improvement walks descriptor_dimensions blocks nested under signal_types entries, which the previous validator skipped, so stale dimension values inside per-signal-type descriptor overrides now surface as warnings. The second improvement adds a legacy whitelist file at scripts/legacy-descriptor-overrides.yaml that preserves three pre-#57-resolution descriptor uses (dcp-ai active today, jep and fidelity-spec latent until those maintainers reformat) without warning the maintainers, with each entry annotated with resolution_issue: 57. Validator state after the hardening: 5 errors, 11 warnings, across 26 crosswalks.
The bigger ship today is operational. Four layers of drift prevention installed across eight public repos. Layer 1 is a pre-commit hook that scans staged content for a list of internal-only patterns and hard-blocks the commit if any match. Layer 2 is a CI scan workflow that runs the same pattern check on every push, in case the local hook was bypassed. Layer 3 is a standardized .gitignore block that excludes the categories of files that should never enter version control. Layer 4 is a final scan inside the propagation script that runs the same check before any cross-surface update touches the file system. Together they form a four-point structural check against private-context drift into public repos. The categories include literal absolute paths, internal codenames, and operational artifacts that have no business outside the workstation. Seventeen commits across the eight repos.
The day reads like three vocabulary moves and one engineering move, which is roughly accurate. The vocabulary moves all extend the canonical surface based on real production patterns: completion_ratio because three issuers shipped it independently, self_enforced because Nobulex's architectural pattern fits no existing value, the validator hardening because issue #57's resolution had three legacy descriptor uses that needed graceful handling. The engineering move is structural rather than reactive. Pre-commit hooks and CI scans are cheap, and they cost nothing once they exist; what they prevent is the kind of small drift that compounds into public-surface inconsistency over time. Better to encode the discipline once than to remember it on every commit.
Day 74: Verbal confessions, not brain scans.
Wave 1 accountability shipped today on SDK v2.5.0-alpha. Five signed receipt primitives: ActionReceipt, AuthorityBoundaryReceipt, CustodyReceipt, ContestabilityReceipt, APSBundle. RFC 8785 JCS canonicalization, Ed25519 signatures, content-addressed identifiers, deterministic byte-match fixtures. 57 new tests across six suites. The full suite is now 2,545 tests, 0 failures. MCP v3.1.1 picks up the dependency. Python v2.3.0 ships for parity. ClawHub skill v5.8.0 carries the new surface. The bundle is the deliverable. The design principle behind it is what matters.
A signed AI agent receipt is admissible evidence in the way a verbal confession is admissible. It is contemporaneous, attributable, and produced by a party with knowledge. It is not a recording of cognition. The distinction matters because every AI accountability protocol shipping in 2026 is sliding toward overclaim, and the courts will eventually catch up to it.
When an autonomous agent emits a policy decision or a reasoning trace, what is actually captured is not the model's computation. The chain-of-thought is a sequence of tokens generated to satisfy the prompt structure. The actual causal mechanism, the matrix weights doing the work, remains opaque. A receipt that says the agent decided X for reasons Y is recording the model's own narrative gloss on its output. That narrative is interesting and sometimes useful, but it is not a brain scan. It is a verbal confession.
Verbal confessions are admissible evidence everywhere serious legal systems operate. They carry weight. They are contestable. Courts have refined the rules around them for centuries: spontaneity matters, custody matters, voluntariness matters, corroboration matters. None of that requires the confession to be a true record of mental state. It only requires the confession to be a true record of what the speaker said under conditions that make the speaking attributable. This is the right epistemic ground for AI agent receipts.
Most current accountability frameworks try to overshoot. They reach for intent and mens rea and the model's reasoning. That ground is not defensible. An LLM does not have intent in the legal sense. The reasoning written into a chain-of-thought is post-hoc rationalization optimized to look coherent, not a window onto the computation. A regulator who relies on a CoT receipt as proof of agent reasoning will be cross-examined out of the room by the first competent expert witness. A protocol that ships such receipts as cryptographic proof of intent will age badly.
The honest version is narrower and stronger. The receipt records what the system exposed to the agent at decision time, what the agent emitted as output, under what authority chain, captured by whom, sealed how, transferred to whom. None of those fields claim to know what the model thought. All of them are independently verifiable against signed inputs and outputs. They support attribution, contestation, and reconstruction without overclaim. Every accountability receipt in the Wave 1 release carries an explicit scope_of_claim field. The field names what the receipt asserts and what it does not. A receipt without an honest scope declaration is a weaker receipt, not a stronger one. Hiding limits does not make evidence more useful in court. It makes it easier to impeach.
The "drive on red, get a ticket" model rests on this discipline. Cars run red lights. Cameras catch them. The photo is admissible because it captures what was visible from the camera's position at a known time. The photo does not claim to know what the driver was thinking. It does not need to. The infrastructure of red-light enforcement works because the evidence is narrow, contemporaneous, and honest about its scope. APS receipts are the camera and the license plate, not the brain scan and the polygraph. That narrowing is not a weakness of the protocol. It is the source of its evidentiary weight.
Three other moves landed today.Vocab PR #66 merged Edison Munoz Duran's Agent-DID crosswalk as the second co-drafted-with-aeoess crosswalk in the vocabulary; the A2A composition contract co-drafting now runs on a shared spec branch with Edison. VeritasActa verify PR #7 closed cross-layer integrity at 10/10. Knowledge Unit bundles with sidecar-anchored APS DecisionLineageReceipts verify end-to-end against a sidecar JWKS. Ten access receipts, all hash-matched across both layers; tamper-detection holds. A2A #1786 acknowledgment posted to @arian-gogani for the Nobulex byte-match verifier scripts; reciprocal verification queued for tomorrow morning.
The pattern holds. Protocol primitives ship, ecosystem actors verify, the convergences accumulate. Today's contribution is mostly vocabulary, the way Paper 8 was. Names for what receipts prove. Names for what they do not. Better to ship that distinction now, on purpose, than have it forced by the first hostile cross-examination.
Day 73: A paper on what receipts cannot prove.
Paper 8 went to Zenodo today. The Evidence-Safety Gap in Cryptographic Agent Governance. The thesis is the kind of thing the protocol architect should be the first to say out loud. Cryptographic agent governance proves procedural validity. It does not prove effect safety. These are different things. Identity, delegation, policy decision, and execution receipt together establish that an action was procedurally valid under a declared regime. They cannot, by themselves, establish that the action's effect on the world was safe. The omitted-variable framing makes this precise: the procedural validity predicate over (identity, delegation, policy, action, receipt) excludes variables that may determine effect safety.
The paper names compliance-complete failure as the simultaneous condition of procedural validity and unsafe effect. Five omitted-variable classes get formal definitions: semantic state, population state, trust state, pipeline state, temporal state. Each class gets a constructive defeat against receipt-chain forensic signals — explicit traces in an open-source reference implementation showing how a procedurally-valid sequence of receipts can compose into an unsafe effect that no individual receipt is wrong about. The paper's load-bearing claim is narrow: scenarios show construction, not prevalence. The minimal contribution is the formal separation of procedural validity from effect safety in receipt-based agent accountability. Two design implications follow. Claim-scoped receipts — every receipt declares what it proves and what it does not. Authorization-effect separation — the gateway that authorizes an action and the system that observes its effect must be distinct, with neither able to silently become evidence of the other. Neither closes the gap. Both make it visible and auditable.
The reason this paper exists at all is that the protocol's own success creates the failure mode it describes. As receipt chains get richer and more verifiable, downstream consumers start treating receipt validity as a proxy for action safety. That substitution is silent. The receipts are honest. The chain verifies. The action was unsafe. Without explicit vocabulary for the gap, the system that does its job perfectly looks identical to the system that fails to detect what it was never asked to detect. The paper's contribution is mostly vocabulary — names for the failure class, names for the omitted variables, design patterns that surface the gap rather than obscure it.
Three SSRN submissions and an ORCID profile. Agent Social Contract, Physics-Enforced Delegation, and Cognitive Attestation entered SSRN today, each classified into five-to-six CS networks where the actual reviewer audience reads. Quantum Information for the IBM hardware experiment. AI Law, Policy & Ethics for the auditability angle. Cybersecurity, Privacy & Networks for the cryptographic primitives. Theoretical Computer Science for cryptography and distributed computation. Generative AI for the Llama-3.1 sparse autoencoder work. The classifications are not decoration. They route papers into the conversations where the work has reviewers. ORCID profile 0009-0002-4700-3594 went live with all eight papers indexed via DOI. The research output now has a single canonical author identifier, which is what citation graphs and standards-body cross-references actually use to resolve a person.
Three vocab merges, two pings.PR #61 added epoch to validity_temporal as observer-relative ticks on substantive state transitions, distinct from sequence's event-relative counts. PR #62 brought governance_attestation.refusal_authority into formal enum compliance with a one-line correction. PR #52 co-authored with @nanookclaw landed a 309-LOC pure-Node entity_continuity validator with 32 tests and four reference fixture vectors. Two pings out on the pairwise crosswalk (PR #55, awaiting @tomjwxf distribution analysis) and the invariant-survival doc (PR #51, awaiting @QueBallSharken BBIS-side concurrence). Both have specific questions attached.
In-toto SVR extension Go. The path forward for governance attestation as an in-toto SVR extension is now scoped. Worked-example draft underway. May 1 maintainer meeting on the calendar. The bilateral byte-match track with marcelamelara stays alive in parallel. Two paths forward: upstream contribution and bilateral interop demonstration. Both produce evidence the predicate works against real attestation infrastructure.
Today is the kind of day where the protocol becomes more honest about itself. Eight papers on Zenodo. Three crossing into SSRN. One ORCID profile. One paper that names the gap the others do not close. The cumulative output is not "the protocol works." It is "the protocol does what it does, and these are the named limits, and here is the vocabulary for talking about what cannot be receipt-proven." That distinction is the difference between a protocol that ships and a protocol that gets adopted.
Day 72: A primitive shipped, an outreach opened, two merges.
The Instruction Provenance Receipt module is live on npm. agent-passport-system@2.4.0-alpha ships the canonicalize/envelope/verify trio for binding agent authority to a content-addressed digest of declared instruction files at delegation time. The recurring failure mode in recent AI IDE advisories — agent receives authority under one instruction context, a workspace file (README, .cursor/rules/*, .cursor/mcp.json, .vscode/settings.json) changes mid-session, agent acts under instructions that were never part of the original authority context — has a structural primitive now. OWASP AIVSS describes this class as Goal Manipulation. The IPR module sits at src/v2/instruction-provenance/ in the SDK with 32 conformance tests + 27 adversarial tests passing inside the 2,479-test suite. Demo branch with a byte-parity-checked drift-denial walkthrough is at demo/drift-denial-cursor-cve.
The companion gateway proof-of-concept is public. aeoess/aeoess-gateway-v0-poc is a minimal HTTP service that recomputes the IPR context_root against the same file set at action time and denies if the digest no longer matches the receipt. Three case fixtures (create_pr, read_file, send_payment) demonstrate before/after deny semantics. The structural property is portable: APS is one implementation of the receipt shape, but the pattern works for any agent runtime that wants to bind authority to a file-content digest. A first-contact email went to security@cursor.com framing the primitive as a structural mitigation for the recent advisory class, with explicit honest-scope language about what IPR does not do (it does not classify files as malicious, it only binds authority to the file state that existed at delegation time).
Two vocab moves and one canonical-term proposal. PR #63 from @piiiico added the trust_verify endpoint to AgentLair's behavioral_trust.endpoints block, a third surface alongside trust_profile and trust_gate that accepts an AAT JWT directly without requiring a resolved agentId path parameter. Endpoint verified live (HTTP 401 with HSTS, CSP, JSON content-type, 112-byte structured error body — production gateway). 5-gate review passed clean. Issue #64 proposed completion_ratio as a new canonical signal type, opening the discussion before converting to a PR per CONTRIBUTING.md. Three independent implementations confirmed in the original A2A #1628 thread (AgentID rolling 180d, APS configurable defaulting to 90d, RNWY derived 24h via peer_review sybil analysis), so the two-implementation rule is met comfortably. The agent-trust-verification-providers spec at the cross-vendor org received a REQUEST_CHANGES review on PR #8 for a structural peer_review/behavioral_trust mapping correction needed in four places.
Day 71: A spec, two merges, and one open question.
A new GitHub org went up for cross-vendor specs. agent-governance-spec hosts agent-trust-verification-providers, the working draft of how trust providers compose against the canonical vocabulary. The org is co-edited with Lars Kroehl of MolTrust / CryptoKRI GmbH after he accepted six conditions on editorial process, license separation (spec is CC-BY-4.0, reference implementations stay independent), and what counts as a schema-shape question versus a schema-fields question. v0.1 SPEC.md is drafted, six tracking issues are open for structural decisions, and the editor line reads "Tymofii Pidlisnyi (APS by the project), Lars Kroehl (MolTrust / CryptoKRI GmbH)." This is the first time a spec lives outside the APS org, which is the right shape: cross-vendor specs should not live inside any single vendor's account.
Two vocab PRs merged with calibrated review. PR #59 from lktron00 added the DCP-AI crosswalk: composite Ed25519 + ML-DSA-65 (FIPS 204) signatures shipped from day one across four reference SDKs, with real production deps (@noble/post-quantum + tweetnacl) and a 72KB interop test vector file. The substance gates passed cleanly. Eight explicit no_mapping entries, each naming the production issuer for the gap they declared, was the kind of scope discipline the repo was built to surface. PR #53 from kevinkaylie landed Step 2 of the Interop Week 1 four-signal compose test: AgentNexus three-issuer fixture with JWS Ed25519 signatures verified end-to-end and prior_signal_digest matching Step 1's compound_digest byte-exact. Two real partners doing real work.
The DCP-AI merge surfaced a question that needed its own thread. Composite post-quantum signatures live as an out-of-vocabulary primitive in lktron00's crosswalk. asqav (jagmarques) ships ML-DSA-65 in production already. Other systems are Ed25519-only. #60 opened the cross-vendor scoping discussion: is post-quantum signature capability a property of the issuer, or of the signal class, and how should the vocabulary express it without overcommitting to particular algorithm choices? Three options framed, ranging from documentation-only to a crosscutting attribute matrix. The four named questions for the WG include one nobody has been asking out loud: should the canonicalization profile be a sibling concern (DCP-AI uses dcp-jcs-v1, asqav uses one JCS variant, APS uses another, and several systems leave it undocumented). No PR, no schema change, no timeline pressure. Reading the room first.
Convergence on the epoch proposal landed cleanly. #58 got three independent confirmations: lawcontinue from a distributed inference setup where a single 50-token generation produces 50 sequence ticks but zero state transitions, kenneives confirming AgentGraph's CTEF v0.3.1 session_epoch maps onto epoch verbatim, and srotzin from HiveTrust adding a substantive-transition lower-bound clause for the PR description. Direction locked, PR follows. Separately, #57's reading (b) on refusal_authority as location versus enforcement_class as strength got endorsed by lowkey-divine, with the 24-hour objection window closing tomorrow.
One thing the day did not ship: a contributor reputation tool. The work happened, the prototype got built and reviewed under multiple adversarial passes, and after the reviews it was clear the artifact would have been a YAML reader with reputation-shaped framing it could not actually defend. Activity heuristics measure the wrong axis. Metadata-grounded scores measure another wrong axis. A real reputation primitive would need observable actions with observable outcomes, calibrated priors, and counterparty-distance weighting on every evidence edge. None of that ships in a weekend. The prototype was deleted before any external surface mentioned it. Some weeks the right outcome of three hours of work is to delete three hours of work.
Cross-thread engagement landed on in-toto/attestation #549 (10/10 byte-match round-trip with arian-gogani's Nobulex bundle against APS canonicalize-fixture-v1, May 1 cross-axis composition meeting confirmed), A2A #1786 (sequence vs session_epoch distinction, observer-relative framing), ERC-8004 #77 (endorsement-volume-vs-spend-spec split, srotzin confirming HiveTrust's production revocation-propagation latency at p50 1.4s p99 3.8s on Base L2). The pattern is consistent: where there is observable substance to confirm, the ecosystem confirms it. Where there is no substance yet, the right move is to do the work and let the confirmation follow.
Day 70: Pattern detection, run on ourselves first.
Three artifacts shipped before the discussion. aeoess/aps-conformance-suite went public with 37 fixture vectors across four categories (bilateral-delegation, inference-session, instruction-provenance, AIVSS scenarios), all byte-identical reproducible from a deterministic Ed25519 seed. aeoess/governance-attestation-predicate went public, the in-toto sibling to nobulex's Decision Receipt PR #549, five fixture vectors plus a composition test that walks the receipt chain across both predicates. The APS ↔ ACTA receipt crosswalk opened as vocab PR #55, 14 mappings calibrated against actual shipped versions of @veritasacta/* and protect-mcp. Three parallel CC sessions, three commits, three pushes, one evening.
Then contributor-check from MS AGT v3.3.0 installed on three active repos: agent-passport-system, agent-passport-mcp, agent-governance-vocabulary. Pinned to commit 15e001f9b53f, threshold HIGH for the calibration window. Ran it against ourselves. Profile risk HIGH, three signals fired: recent_repo_burst (41 repos in 90 days), cross_repo_spray (issues in 72 repos in 7 days), credential_laundering (citing aeoess merges across 5 repos). Every signal is technically correct as a pattern. Substantively the cross-repo activity is independent convergence on the same governance primitives, not coordination. The signal density is the work density.
Discussion #20 went up the same evening: "The threat is laundering, not cyborg contribution." Endorses the tool, names that most active contributors today are human + AI systems including us, draws the substance-vs-pattern line. Companion comment on Imran's #1473 with the link. Pattern detection is the necessary half. Substance evaluation is the other half.
Day 69: Five external merges, two co-authored opens. Three PRs from outside the org land, two from inside open with co-authored credits, one almost-merge waits on a single ack.
Three PRs landed in aeoess repos today where the primary author was somebody else. Vocab PR #49 from madeinplutofabio merged at midmorning PT, mapping the PIC Standard's verification-pattern primitive to the vocabulary's canonical signal types. The crosswalk models action-boundary verification as a parallel surface to visa-layer issuance rather than as a sub-field beneath it. Visa-layer primitives like APS, AgentNexus, and MolTrust handle issuance-side identity and delegation tokens carried by an agent. PIC handles receiver-side fail-closed verification at the action boundary, consuming trust roots that may include visa-layer issuers but owning the verdict primitive itself. Both compose. Neither contains the other. The crosswalk landed describing PIC in PIC's own terms first, with the composition pattern documented in the notes block, and PIC became the twenty-third entry in the vocabulary registry. Ecosystem precedent is the discipline that protects the vocabulary from accidentally setting permissive templates.
Vocab PR #46 from piiiico merged this afternoon after one round of structural revision. The first iteration mapped AgentLair's TrustProfile to peer_review as the primary signal type. The full v0.2 review against piiiico's live envelope and the canonical vocab definitions found that primary mismatched: peer_review is task-completion attestation signed by a delegating agent after a service agent completes work, and AgentLair's TrustProfile is aggregate behavioral scoring across events with no task binding. The fix was to promote behavioral_trust to primary with match: exact and demote peer_review to no_mapping with a note explaining the definitional gap. piiiico turned that around in fifteen hours. Same commit added AgentLair to behavioral_trust.issuers_in_production, which now lists three independent issuers (RNWY, Logpose, AgentLair) producing real signal data against the same canonical type. That is the production-signal evidence behavioral_trust needs to remain canonical with multi-issuer coverage.
aps-system PR #19 from lawcontinue shipped a seven-vector test pack for the CTEF inference-session category at fixtures/inference-session/. Each vector covers a different shape of session attribution: clean handoff, mid-inference rotation, distributed cross-node, sequence-bounded validity, parent-chain Merkle anchoring, replay defense, and a negative case where the session_id does not match the canonical JCS hash. Every signature is RFC 8785 JCS-canonicalized and Ed25519-signed. Two structural fixes flagged in review (a session_ids array shape mismatch and a missing parent_receipt_hash wiring on one vector), lawcontinue pushed corrections at commits 95c1ca9 and 73d52c0 in twenty-two minutes. Second time this week he has turned a structural review around inside half an hour. The pattern is starting to shape how the SDK fixture queue moves: external contributors land first, the maintainer review surfaces the structural points, the contributor iterates same-day, and the merge happens before the day ends.
The opposite shape happened twice today. Vocab PR #51 added docs/descriptor-dimensions/invariant-survival.md, with QueBallSharken as Co-authored-by: on the commit. The doc names the BBIS canonical language explicitly at three structural points so the vocabulary references the same vocabulary BBIS uses, not a parallel coinage. Vocab PR #52 added the entity_continuity PDR validator built directly from nanookclaw's slope-computation spec posted earlier the same evening. 309 lines of pure-Node validator with no dependencies, a 32-test suite all passing, four reference vectors covering stable, drifting, improving, and out-of-range agent behavior. The slope formula is from his spec: L2 distance over four normalized fingerprint dimensions, OLS over a window of twelve sessions, max divergence of two, max possible slope of 2.0/(N-2). nanookclaw posted the spec at 21:34Z; the validator opened at 22:52Z. The entity_continuity arc now has a deterministic checker the vocabulary registry can point at.
a2a-compliance-harness PR #1 is still in DRAFT but ready in substance. MoltyCel published moltrust v0.2.0 to PyPI today, then opened a PR adding moltrust as an optional resolver adapter to the harness with a clean fallback path when the package is not installed. Thirteen tests pass on Python 3.12. Two minor asks from review (a docstring sharpening and a pytest.skip for the no-moltrust path), both acknowledged. Co-maintainer access granted on the harness repo at the same time. Once he marks the PR ready for review on Monday, the merge takes about a minute. MolTrust and APS have been shipping against the same surface in iteration cycles measured in days, with MolTrust now positioned as a drop-in second-issuer reference under the harness's resolver interface.
This was not an APS-shipping day. The protocol, the SDK, the MCP server, the gateway, and the website did not get version bumps. What shipped was other people's code into surfaces APS maintains, plus code that APS produced co-authored with the original spec authors. The vocabulary repo is functioning as a multi-contributor coordination surface, not a one-author ship lane. Three production issuers on behavioral_trust. Twenty-three crosswalks. Two new docs co-authored with their original spec authors. One almost-merge from MolTrust waiting on a single ack. Day 65 named the pattern first ("Five issuers converge on one convention"). Today extends it. The cadence has held since.
Day 68: Five-way convergence on claim_type. Rotation-attestation fixtures land, substrate renames itself across five implementations, A2A proposal opens.
Two things shipped today that were not on the morning's list. One was a substrate rename across five implementations that made the wire format consistent for the first time. The other was a proposal-phase issue at A2A that landed in the maintainer queue with co-normative endorsement four minutes after opening. Both started from a thing that did go on the morning list, the rotation-attestation fixtures, and ended somewhere unexpected by evening.
Five canonical DID-document fixtures, a JSON Schema, a test-vectors manifest, and a deterministic generator landed at aeoess.com/fixtures/rotation-attestation/ at 11:40 PT. The set covers happy-path, cross-signed, migration-attested, happy-path-compound (cross-signed and migration-attested in one entry, the realistic production case), and negative-no-attestation (a rotationLog entry with empty rotationSignature that must trigger INVALID_CLAIM_SCOPE on a conformant verifier). Every signature and hash input is RFC 8785 JCS-canonicalized. The attestor is a dedicated fixture-signing key separate from the gateway, with the seed documented so any third party reproduces the set byte-identical from a clone. Within hours, AgentGraph landed test_aps_rotation_attestation_interop.py in main at commit 8baaad4, live-fetching the fixtures at test-collection time rather than pinning a repo-local snapshot, dual-locking each fixture against the published test-vectors.json canonical SHA-256 and what their canonicalize_jcs_strict produces from the live body. All five fixtures reproduce byte-identical. The canonicalization loop closed: APS bilateral delegation, APS continuity rotation, and AgentGraph CTE vectors now pin the same canonicalization through JCS bytes rather than shared code. Which is the actual interop test.
By late afternoon the work shifted into A2A #1672. The four-layer split (identity / transport / authority / continuity) had been the working substrate for over a week. AgentGraph's CTEF v0.3.1 had frozen it as normative bytes the day before. Three independent Python canonicalizers (AgentID, AgentGraph, APS) plus one TS canonicalizer (Nobulex's @nobulex/crypto) were already byte-matching against shared fixtures. Then at 23:38Z kenne flagged a naming collision. AgentID had been shipping claim_type on the live /verify endpoint since the spec hardened. AgentGraph plus APS rotation-attestation were using claim_category. Same concept, same closed-set values, different key name. A verifier choosing the wrong key would silently split the harness on lookup.
At 23:57Z Harold confirmed AgentID /verify and /re-verify shipping with claim_type, 32/32 endpoint tests pass, JCS canonicalizer byte-matching all 10 APS bilateral-delegation vectors. Third independent canonicalizer joining the byte-match harness. At 00:21Z kenne renamed AgentGraph's substrate claim_category → claim_type at commit agentgraph-co/agentgraph@69ad94d. Reserved keys updated to claim_type.envelope and evidence_basis.evidence_type.payment_execution. arian-gogani at Nobulex updated the TS verifier's key pin in the same window. At 01:07Z srotzin posted from HiveTrust confirming that HiveTrust's internal schema also uses claim_type, with a clean two-axis resolution: claim_type for role/capability/audit at the CTEF envelope level, hivetrust.internal.claim_type for risk-tier bucketing at the HiveTrust application level. Disjoint namespaces, explicit projection rule mapping HiveTrust claims onto ctef.envelope.claim_type='authority' when carried in a CTEF-composed envelope. Matches §6.7 superset-with-projection exactly. That is five implementations agreeing on a discriminator key name, a closed set of values, an envelope-reservation slot, and a structural error code, all produced through CTEF v0.3.1's normative endpoint, all four byte-match harnesses validated against each other. None of which existed twelve hours earlier.
Issue #1786 opened at 00:53Z. Cites claim_type, cites commit 69ad94d, cites Section 4.4.4 for AgentExtension. Uses the existing extension mechanism with params carrying the per-claim payload, no proto schema changes proposed. Reference URI a2a-protocol.org/extensions/cryptographic-agent-identity/v0.3.1 with the experimental prefix. Within four minutes kenneives posted co-normative AgentGraph endorsement with a four-way byte-match harness table at the top of the proposal thread, calling the 48-hour multi-implementation validation arc a strong sponsorship case and offering AgentGraph-side test vectors and conformance fixtures once the experimental-ext repo opens. By 01:34Z lawcontinue posted a substantive question on validity_window for long-running inference sessions mapped to a real production case (245 decode steps, ~130ms cadence, two-node cross-distribution, identity sequence-bound to the run, no mid-inference re-verification). APS already implements this with sequence_bound over the rotation event sequence. Reply confirmed, lawcontinue committed to a fixture contribution against the bilateral-delegation regression once the spec freezes. The thread now has five-way alignment on the discriminator, four-way byte-match harness published at the top, one production case mapped onto existing primitives, two committed test-vector contributors plus a third potential.
Vocab PR #46 merged today: crosswalk/agentlair.yaml, piiiico's pre-delegation behavioral check. Mapped to peer_review as primary signal type with match: exact. Production data exists. Secondary mappings on behavioral_trust (exact), trust_verification (partial), governance_attestation (partial). Eight explicit no_mapping entries with technical rationale. The substantive contribution beyond the mapping itself is the four-temporal-layer sequencing piiiico documented in the peer_review notes block: pre-delegation → at-delegation → at-execution → post-execution → feedback loop. Each layer answers a different temporal question about the same agent action. The four-layer framing landed in notes: on the peer_review entry, not as a new top-level section, which is the precedent discipline PR #44 surfaced last week. piiiico got it right on first PR. Separately, the wallet_intelligence → behavioral consolidation also merged (PR #47), closing the Apr 14 consensus from #6.
steipete closed openclaw#49971 as COMPLETED, MoltyCel's RFC for Native Agent Identity & Trust Verification. Easy to read this as a soft punt; it is not. steipete cited five public hooks at file-and-line precision against commit 45146913007d: before_install for skill install gating, before_tool_call for per-action enforcement at the runtime tool-call gate, inbound_claim plus message_received plus before_dispatch for inter-agent verification, gateway_start for self-verification on startup, plus the public SDK reference docs confirming all of them as supported plugin contracts. That is a documented integration contract. Build your trust provider on these, with line numbers attached. Reframes the OpenClaw integration story for APS, MolTrust, AgentLair, AgentID, and any other trust provider operating in the openclaw ecosystem: the integration artifact is a plugin against the documented hook surface, not a core dependency. For APS that means @aeoess/openclaw-trust-plugin, npm-publishable, ~200-300 lines, calls gateway.aeoess.com/api/v1/public/trust/{agent_id} for per-agent JWS-signed trust attestation. Queued as the deliberate follow-on.
What ties the day together is composition that shipped across surfaces other people own. The fixtures composed against AgentGraph's harness without either side touching the other's repo. The substrate rename composed five implementations onto one wire-format key without a coordinated migration. The proposal composed against A2A's existing extension mechanism with no proto changes. The vocabulary composed AgentLair's pre-delegation layer onto the existing four-signal vocabulary without a new top-level section. The OpenClaw closure composed APS as a plugin against documented hooks rather than as a core dependency requiring upstream change. Every one of these is an extension against an edge, not a change to the core. That is the shape that has been emerging since v2.0.0 hit npm a week ago. Most of today's substantive output came from peers: kenne, harold, arian, srotzin, piiiico, lawcontinue. APS produced fixtures and a proposal. The other five did the rest. That is the design working as designed.
Day 67: Ecosystem Directory. Who is building the agent economy, listed as rows.
I needed a way to keep track of what was happening in the agent infrastructure field faster than a spreadsheet and more honestly than a curated list. The contribution map in aeoess_web/specs/contribution-map had the raw data, 130 people across 93 threads, but it was optimized for my own navigation at session start, not for anyone else to read. The Agent Ecosystem Directory is the public-facing version of that. Three tables. Projects, people, threads. Everything pulled from live GitHub, everything sortable and filterable, dates visible on every row.
What the directory actually does that a graph could not: when a new account posts a promotional pitch on an OWASP thread claiming their product replaces APS, you can sort the People table by account age and see their row glow amber next to a 10-year veteran in plain type. Account ages are pills, amber under 60 days, green 60-365 days, plain after that. Sort Projects by last push and see the field's velocity in one column. Sort Threads by updated-time and see what's actually hot this week. The graph I had shipped before this pass was pretty. This one is useful.
Every person who has posted on any of the 93 governance threads we track is in the directory. No curation, no ranking, no tier assignment. Score is the same behavioral score the contribution map emits (post count weighted by thread breadth, receipt presence, mention network), used here only as a default sort, not a judgment. Filter chips at the top of each table surface "active this week," "new account," "open PR," "multi-contributor project," so the field's shape is readable at a glance. Click any row for a drawer with full detail and clickable cross-links to related rows.
Repo is at aeoess/agent-ecosystem-map. Live at aeoess.github.io/agent-ecosystem-map. Licensed CC-BY-4.0 for the data and MIT for the code. Explicitly not a property of APS long-term: the README calls for co-maintainers from other projects in the directory, and the intent from day one is to transition to neutral stewardship once anyone wants to co-steward it. To add a project, open a PR with a YAML file in projects/. To correct or enrich an existing row, open an issue. People and threads are pulled automatically from public GitHub activity, nothing to claim or edit there.
The build came out of the same question the directory answers for others: what does this field actually look like, and who is in it. The answer for me, after a month of posting on threads across 18 projects, is that the agent infrastructure space has about 130 people doing substantive work, most of them human-agent pairs, spread across identity, delegation, enforcement, commerce, memory, observability, and reputation layers that compose differently in every stack. The Model Citizen framing for how APS engages the ecosystem depends on that composition being real and visible. The directory makes it visible.
Day 67: BBIS grammar and FRCBE lock, three specs converge on shared vocabulary.
Overnight the critic who forced v1.1 of the enforcement-trust-anchor doc posted the answer to the open question that doc left on the table. If a Web2 target cannot verify delegation-bound authorization natively, and the architecture honestly labels the residual as unresolved at the wire format level, does BBIS treat the deployment as admissible? The answer is no. Honest declaration is a claim-grammar discipline, not an admissibility upgrade. Typed epistemic receipts narrow what APS can truthfully say about a path. They do not convert a non-refusal-capable path into a refusal-capable one. That correction matters because v1.1 was still softening the Class B framing more than the structural honesty allowed.
The same reply proposed a classification grammar that maps cleanly onto the five-bucket taxonomy v1.1 had been using. Closed paths where the invariant survives refusal-capably to the true irreversible authority. Bounded paths where the same claim holds but only within an explicitly scoped primitive. Partial paths where some refusal boundaries still exist but do not survive all the way. Detectable-only paths where the evidence is strong but invalidity is still expressible at the true sink. And governance theater for non-closure claimed as closure. The grammar is sharper than what APS had on its own because it centers the invariant survival question rather than the cryptographic construction that attempts to establish the property. The construction is an implementation detail of the claim. The invariant survival is the claim itself. v1.2 of the trust-anchor doc adopts the BBIS vocabulary directly and credits Hensley's OWASP#817 as the source.
The parallel move landed on qntm#7. The primitive that the capability-token spec had been calling the sink-signed effect receipt got a name: Final Refusal-Capable Boundary Event, FRCBE, coined in the same thread by the same author. v0.2 of the capability-token spec adopts the name, which also resolved an asymmetry the earlier draft had been carrying. M4 in v0.1 was doing double duty as both the boundary event and the post-execution record. v0.2 splits that. M4 is the FRCBE emission, the moment where authority either enables or refuses the specific mutation attempt and the sink signs the outcome. M5, new and optional, is a post-effect forensic record for deployments that want a separate trail after the fact. Most deployments omit M5. The boundary event is what matters for closure.
Same morning, kenneives at AgentGraph posted on qntm#7 with concrete deliverables. CTEF v0.3 will accept delegation_chain_root as a composition field. A /.well-known/cte-test-vectors.json endpoint will publish byte-for-byte inputs with expected verdicts. Cross-test with APS plus AgentID plus AgentNexus by Apr 30. Kenne adopted FRCBE in the same comment: AgentGraph's EnforcementVerdict family is the verdict-shape taken at the FRCBE event, not a replacement for it. Three specs, three authors, three vocabularies locking onto the same architectural invariant within eighteen hours. That is what convergence actually looks like when the primitive is real.
v1.2 of ENFORCEMENT-TRUST-ANCHOR.md and v0.2 of CAPABILITY-TOKEN-SPEC-DRAFT.md are on the feat/v1.2-bbis-grammar branch, review before merge to main. Convergence posts live at qntm#7 and OWASP#817. Vocabulary is converging across BBIS, APS, and AgentGraph. The shared words mean the shared primitive.
Day 66: Mutual auth ships, the composition pattern runs on its own, and a critic finds the real gap.
Agents authenticate to systems. Systems do not authenticate back to agents. That asymmetry has been sitting in the APS protocol since the beginning, and it matters because an agent that hits whatever endpoint it is told to hit is phishable by construction. A bank issues a scoped passport to a customer's agent. The agent then needs to connect to an MCP server the bank operates. Without a protocol-level way for the bank to prove that the server the agent is about to talk to is actually its own, the agent has no recourse when a different server claims the name. The scoped passport protects the bank from the agent. Nothing in v2.1 protected the agent from the bank's operational surface.
SDK v2.2.0 ships mutual authentication as a protocol primitive. A downgrade-proof four-step handshake, a local trust-anchor bundle each party carries and updates out of band, adapters for both A2A and MCP so the handshake fits inside the transport's session-initialization hook without a separate connection. Replay defence via nonces and signed timestamps. Downgrade defence baked into the attest signature: the signature covers chosen_version alongside both nonces and the peer's certificate, so a man-in-the-middle that strips supported versions to force negotiation down cannot forge a valid attest advertising the weaker version without breaking the signature. 29 new tests, 2410 total across the full suite. A2A and MCP adapters ship in v2.2.0; autogen, crewAI, langchain, and ADK adapters pick up mutual auth through the shared primitives without separate integration work. What the module explicitly does not ship is also the point: no federation, no gossip, no certificate-transparency-equivalent log, no consensus revocation, no hosted CA. Mutual auth is a protocol primitive. A future federation layer, if one ever ships, composes on top without changing it.
Separately today, the composition pattern we shipped last night at agentid-aps-interop#7 started running on its own. Harold merged the three-signal composed/v1 envelopes at 09:44 UTC. Seven hours later schchit opened PR #8 extending the envelope with JEP as a fourth signal in the decision_event category Kenne carved out on the CTEF thread the same morning. The JEP receipt flows into slots.jep verbatim, no reshape. verify.py recognizes the new version: "jep-v1" string and handles judgment events per their native semantics. The pattern: composed/v1 host stays generic, new signals register by adding their CTEF category, their slots.<issuer> key, and their native-version string to the validator. AgentID covers identity, APS covers authorization, AgentGraph covers security posture, JEP covers judgment events. Any fifth signal with a signed JCS-canonicalizable inner receipt and a new category label composes in the same way. The merge took Harold about a day. The extension by a fourth issuer took half that.
The other ecosystem move today was microsoft/agent-governance-toolkit#1328, which merged at 19:41 UTC. examples/cognitive-attestation-governed/ is a community example layering a signed interpretability envelope on top of AGT's policy decision. AGT decides whether an action is permitted. The Cognitive Attestation envelope signs a sparse-autoencoder decomposition of the model state that drove the decision, so downstream auditors can inspect what the reasoning substrate looked like when the action fired rather than just whether the policy rule matched. 443 lines, two files, zero APS SDK dependency. Third merged aeoess PR in microsoft/agent-governance-toolkit after PR #274 (Mar 16, reputation-gated authority proposal) and PR #598 (Apr 6, APS-AgentMesh adapter), and the first community-example-style contribution to the repo. The community-extension boundary that ADR 0006 formalized two days ago is where this kind of example naturally lives: policy evaluation stays in AGT core, proofs about the reasoning that produced the decision live as extensions that plug into the decision boundary without changing AGT's interface.
Late in the evening EchoOfDawn at SageMind AI accepted the invitation to co-maintain aeoess/autogen-governance-adapter, a glue repo for the before_tool_call hook pattern autogen needs to compose three-layer governance: identity via APS passport, authorization via delegation scope, optional behavioral trust via a provider plugin. She posted the acceptance at 21:15 UTC with a pushback on the portable-vs-context-bound framing that had been floating in the thread. The framing worth carrying forward: MoltBridge attestations are not globally portable trust scores. Each attestation is a scoped edge in a graph with issuer, subject, context, skill, policy_constraints. What is portable is the evidence, the signed edge. The policy engine on the verifier side decides whether evidence from context X counts in a decision in context Y. Same property APS delegation scope carries. Hard authorization and portable evidence compose; they do not replace each other. By 01:33 UTC the repo was live at aeoess/autogen-governance-adapter, 12 tests passing across Python 3.10 / 3.11 / 3.12 on first push.
Then late on April 22 a technical reader read the v2.2 mutual-auth writeup and asked a question we did not have a clean answer to. A compromised gateway can emit a cryptographically valid PolicyReceipt attesting to an enforcement decision that never occurred, and nothing in the current deployment lets a third party dispute it. The delegation chain still verifies. The passport binding still verifies. Monotonic narrowing still mechanically rules out out-of-scope forgery. What the gateway can do is fabricate a within-scope enforcement decision. As single-party attestations, those receipts are indistinguishable from real ones. v1.0 of the enforcement-trust-anchor spec listed four closure paths (bilateral receipts, tamper-evident log, TEE, multi-gateway quorum) and committed to bilateral receipts as primary. That framing was the ecosystem's consensus. After sustained adversarial review, it turned out to be incomplete in a specific way: all four paths preserved the gateway as the attestation root and diluted single-party lying through honesty assumptions on other parties. None of them removed the gateway from the loop for the property under dispute.
The reorganization that survived review is the sink-awareness boundary. APS targets split into two classes and the honest closure story differs per class. For resources that can verify delegation-bound authorization tokens natively (our MCP server, APS-compatible agents, SINT-integrated sinks), full structural closure is available via a four-piece stack. The sink, not the gateway, defines the canonical action in a signed challenge. Authority is represented as consumable tokens minted by the delegator at delegation time, not by the gateway at evaluation time. After execution the sink signs its own effect receipt, which becomes the primary attestation that enforcement occurred. Every receipt labels each claim it carries as closed, witnessed, or unresolved, so downstream verifiers cannot be tricked into treating self-assertion as cryptographic closure. For dumb Web2 sinks (Stripe, AWS billing, model provider APIs) structural closure is not available at the protocol layer. Whoever holds the connection to a sink that does not read cryptography has absolute power over it. Bilateral receipts, tamper-evident logs, homomorphic state commitments, MPC-TLS for high-value transactions, and BMO ground-truthing narrow the gap. None close it. Deployers using APS against dumb sinks accept residual gateway-compromise risk. Saying this plainly is the point. v1.1 of the trust-anchor spec and the v0.1 capability-token wire format are both on main at ENFORCEMENT-TRUST-ANCHOR.md and CAPABILITY-TOKEN-SPEC-DRAFT.md. Reference implementation on feat/v0.1-capability-tokens with passing end-to-end tests for the full four-message cycle. Bilateral-receipt emission landing in SDK v2.3.0-alpha on npm, released this evening under the alpha tag.
What ties the five threads together is not the volume of shipping. It is the shape. Mutual auth v1 closes an asymmetry in the protocol without adding federation, because federation is a layer above the primitive and composes on top rather than needing to be built in. JEP extends the composition envelope because the envelope was designed to host signals the primary author did not anticipate. Cognitive attestation lands as a community extension against an AGT policy boundary that got formalized specifically to hold extensions like this. The autogen adapter is a new repo, not a new feature in APS core, because composition glue is its own artifact. And the trust-anchor work moved from v1.0 to v1.1 not because the original framing was wrong but because external critic pressure found the sharper statement of the problem. Every one of these moves is an extension against an edge, not a change to the core. The architectural claim that comes out the other side, the one the seven rounds of hostile review converged on, is this: the gateway must stop being the component that both describes the action and originates the usable authority for it. That belongs in the same posture the protocol has been converging toward for months. At this point the ecosystem is enforcing it without us having to name it.
World ID is the root. APS is the chain.
World shipped AgentKit on April 17 with Okta, Vercel, Browserbase, and Exa. The pitch is clean. An AI agent can now carry cryptographic proof that a unique human is behind it. The Shopify demo routes a World ID signature through UCP to complete a purchase, and the merchant ends up with a receipt of human intent.
This solves an identity question that payments alone never could. Is there a human here becomes yes or no, anchored in biometric proof-of-personhood. Real move, genuinely useful.
But agentic commerce has two questions, not one. Who is the human is the first. What is the agent permitted to do on their behalf, with what limits, and how do we trace a specific action back to a specific delegation, is the second. AgentKit handles the first. It does not try to handle the second. That is the correct design choice. It also leaves a seam.
Agent Passport System (APS) fills that seam. A delegation object in APS is a signed, scoped, attenuated authorization that travels with an agent action. It says which human principal delegated it, what scopes are permitted (commerce:checkout, data:read), what budget remains, what TTL is left, and which sub-agents are allowed in the chain. Ed25519 signatures, offline-verifiable, no registry round-trip required. Apache 2.0. Shipped February 18, 2026.
The composition is clean.
World ID proof of human anchors the delegation root. The signed World ID credential becomes the principal attestation for an APS delegation object. From there, the delegation narrows. An agent gets commerce:checkout scope with a $500 budget for a specific merchant allowlist, valid for 24 hours. The agent does its work. Every action it takes is signed against that delegation. At the merchant, a 4-gate preflight runs: passport valid, scope authorized, budget remaining, merchant allowed. Any fail emits a signed rejection receipt. Success produces a CommerceActionReceipt that links the purchase back through the delegation chain to the original human via their World ID.
For a flash sale with limited inventory, that merchant now knows two things. There is a unique human behind this agent. That human delegated this specific purchase, within these specific limits, and the signed chain proves it. Fraud reduces. Over-purchase reduces. Audit trails become real.
The human-in-the-loop threshold pattern in AgentKit's docs maps directly onto the APS HumanApprovalRequest primitive. For purchases above a delegation-specified threshold, the agent pauses and emits a signed approval request. The human's response (signed, via World ID or otherwise) becomes part of the CommerceActionReceipt.
None of this requires either side to change. AgentKit's SDK works as is. APS's commerce adapter works as is. The seam is a header or extension field carrying the delegation object alongside the World ID signature. Any UCP or ACP endpoint can implement the 4-gate preflight in under 50 lines.
The agentic commerce stack has three trust layers emerging. Payments, proof-of-human, and scoped delegation. The first two are consolidating around strong incumbents. The third is the one that answers what did this agent do on behalf of whom, within what limits, provably. It belongs in the stack too, and it composes with what is already there rather than competing.
AgentKit is the root. APS is the chain.
Day 65: Five issuers converge on one convention.
Harold merged Step 1 of Interop Week 1 yesterday evening. His AgentID trust_verification fixture landed clean through the five-check protocol, one signing-convention ambiguity flagged for a future convention table in the bundle README. At 08:40 UTC this morning he came back with something I did not ask for: AgentID's production signer had been switched to raw digest bytes, the convention APS and SINT and MolTrust already use. The already-merged fixture is technically on the old convention; his follow-up PR will replace the one signature field to match the new signer. The fixture data otherwise stands.
That is the Interop Week 1 thesis in miniature. Nobody imposed the convention. Five issuers looked at each other's code and pulled toward a shared shape because the alternative was writing a convention footnote per issuer in the bundle README. When AgentID's second fixture lands, the convention table can read one sentence: all five issuers sign raw digest bytes, 32 bytes, result of bytes.fromhex(compound_digest). No per-issuer exceptions. That sentence is the kind of thing a reviewer at OWASP or IETF will notice later precisely because it disappears into the background.
The vocabulary's context_dimensions draft also closed its review loop today. PR #34 landed with pshkv's sharpening incorporated: resolution_source marked recommended rather than required in v0.1 so early adopters who have real policy dimensions to document but no formal source model yet can still publish; physical_environment_state updated to document that its resolution source varies per sub-field at evaluation time (temperature typically sensor-attested, geofence typically gateway-derived, human proximity deployment-dependent); pshkv's alternative four-value enum documented inline as the v0.2 fallback if the five-value partition surfaces ambiguities in production. tomjwxf's earlier sign-off on the five-value enum preserved, pshkv's refinement added on top of it. Ran the five-check on our own PR publicly before merging, because not running the protocol on our own work because it is ours is exactly the failure mode the protocol is supposed to prevent. Four day-one entries in the vocabulary now, each carrying a non-signal test that names what would disqualify it from belonging there.
On OWASP AARS#32 the conformance question surfaced, and it is worth naming because the answer carries weight. VeloGerber accepted our §3.3 naming position last night and asked a follow-up that was harder than it looks: for AiEGIS APS to earn a v1.0 conformance citation, does an independent Python reimplementation of the APS v1.1 spec count, or does consuming our SDK as a dependency count. The honest answer is that they answer different questions. An independent reimplementation with shared test vectors earns a separate conformance row in the §3.3 table because it demonstrates that the scored behavior is independently reproducible. Consuming our SDK is a deployment pattern, useful for adoption, but the enforcement is still happening in our code. The distinction matters for exactly the reason we pinned the acronym collision in the first place: evidence at the implementation layer and evidence at the adoption layer should not bundle into one reviewer-facing citation. v0.9 cites APS APS as the shipped reference; v1.0 re-evaluates AiEGIS once the independent reimplementation lands. The sequencing does not stall anyone's merge. The concrete offer to accelerate: ship our interop fixtures as a standalone aps-conformance-suite repo that any reimplementation can run its test matrix against, so the conformance bar is legible rather than implicit.
What these three threads have in common is that none of them are about APS. Harold is writing AgentID, not writing APS. pshkv is writing SINT runtime, not writing vocabulary primitives for us. VeloGerber is writing AARS, not adopting APS as a vendor. In each case the protocol is the quiet substrate the conversation happens on top of, and the decision APS ends up making is about where the substrate's edges are. Whose signing convention. Whose resolution model. Whose conformance evidence. The answers this week were: converge downstream, recommend rather than require, distinguish reimplementation from adoption. None of those are APS imposing anything. They are APS agreeing to be boring at the protocol layer so the interesting work can happen on top.
By evening a fourth thread landed that inverted the first three. I posted the APS slot shape on agentid-aps-interop#5 at 19:02 UTC. Harold posted the AgentID slot shape about two hours later. Kenne shipped AgentGraph v1 structural fixtures as PR #6. By 21:08 UTC I shipped PR #7: three composed envelopes stitching AgentID identity, APS delegation, and AgentGraph security posture under one shared subject DID, plus an issuer-neutral Python validator that runs without importing any of the three contributing SDKs, plus APS v1 structural fixtures to feed the APS slot. Kenne ran verify.py on his machine, 51 of 51 checks passed at exit zero, LGTM from the AgentGraph seat. The inversion is the point. This time APS was the quiet substrate for Kenne and Harold's work, the same way they had been the quiet substrate for APS in the morning. The posture does not depend on which side is shipping. Convention convergence took a week for signing-input format. Composition convergence took seven hours for envelope shape. The pattern is in everyone's hands now.
Day 64: v2 promoted, v2.1.0 shipped, foundation filed.
v2.0.0 moved from @next to @latest on npm today. MCP v3.0.0 with it. PyPI got a non-pre-release 2.0.0 that replaces the 2.0.0b0 beta. v1.46.0 and MCP v2.27.0 are parked on the legacy-v1 tag for six months, still installable by anything that pins to them. That is the surface. What is underneath is worth a paragraph.
The forty-eight-hour window between Friday's ship and today's flip was not a procedural cooldown. It was a test. Four external systems landed real code against v2 during the window. AgentNexus Track A fixtures round-tripped through our canonicalization and signature verification. VeritasActa's external_receipts.aps slot got a KU receipt signer. Illia's SINT refresh merged against the vocabulary registry. RNWY's a2a.yaml merged alongside. None of them shipped because of v2. All of them ran through v2 without noticing. That is the kind of evidence you want before you flip a default tag.
The sweep also turned up a small drift worth naming. The Python SDK's __init__.py had been carrying __version__ = "0.15.0" since the 2.0.0b0 beta shipped. pyproject.toml said 2.0.0b0 and the wheel said 2.0.0b0, but if you imported agent_passport.__version__ in a running process, you would see 0.15.0. It is 2.0.0 now, same as the artifact. Exactly the kind of thing a promotion sweep is supposed to catch.
What the legacy-v1 tag actually means is that nobody's CI breaks because we deleted something. Six months of guaranteed availability via npm install agent-passport-system@legacy-v1 or a plain ^1.46.0 pin in package.json. After that we freeze the tag but keep the package on npm indefinitely. No auto-upgrade, no pressure.
The morning was the default flip. The afternoon had four things worth naming.
v2.1.0 shipped same day, with a different primitive than planned
The Day 61 post said the next build was Ledger Events. That was wrong, and the error is worth naming. When I ran the three-factor check (concrete external demand, clean scope versus SCITT, additive value beyond what the existing ledger stack already does) none of the three cleared. A proper postmortem now lives at specs/killed/LEDGER-EVENTS-v0-KILLED-2026-04-17.md with the revival criteria written down for future me. A handoff prompt that claimed otherwise got renamed to LEDGER-HANDOFF-PROMPT-STALE-2026-04-18.md with a stale banner. Better to kill a planned build than ship a primitive nobody asked for.
What shipped instead was v2.1.0 with two primitives that had actual demand. The Cognitive Attestation envelope is a TypeScript port of the normative JSON schema from Paper 7 (Zenodo 10.5281/zenodo.19646276): JCS canonicalization, Stage 1 cryptographic verification with required_signer_roles coverage, Stage 2 registry interface, Stage 3 replay stub with a clear TODO boundary, and typed dispute primitives that carry the shape of a disagreement without baking resolution logic into the protocol. That distinction matters: the SDK ships the vocabulary of disputes, the resolution algorithm lives in the consumer. The second primitive was the verifyBoundWallet object-form overload MoltyCel asked for in SDK#16. Same behavior either way, asymmetry with bindWallet gone.
Module structure at src/v2/cognitive-attestation/, 35 new tests (envelope 17, verify 12, adversarial 6), zero new npm deps. Test count 2,325 → 2,366. Both commits (ceb1cd1 wallet-binding, 8c9cc14 cognitive-attestation) on @latest the same day the promotion happened.
APS submitted to AAIF
Filed as aaif/project-proposals#14, the AI Agent Interoperability Foundation, the path toward Linux Foundation stewardship for the public protocol layer. Position in the submission is the cleanest version of what v2 made possible: the protocol is a solo submission, cross-referencing SINT (Illia's #12) and the three-vendor governance_attestation convergence with MolTrust. The APS company, the YC application, the private gateway, the partnerships: none of them are in the submission. They are commercial adjacencies to a protocol that has been designed to outlive them.
Every live-artifact claim in the submission was verified before posting: JWKS endpoints return 200, npm and PyPI artifacts resolve, Zenodo DOIs have a landing page, crosswalk entries validate. Foundation review waits on the TC triage window. Expected Tuesday UTC if they keep pace with the #12 and #13 precedents.
OWASP AIVSS#32. naming boundary held
A proposal came in to co-list two "APS" references in the permanent v0.9 §3.3 standards citation. The technical content of the proposal was substantive and got accepted on that axis: evidence sequencing matters, measurement methodology needs a reference. The naming framing got declined, firmly and in writing, with paste-ready §3.3 text naming only the Agent Passport System. The technical work is real. The naming collision would have been durable. Not every "be nice" reflex is the right one.
Thirteen ecosystem engagements, one day
The context for this is the structured ecosystem map from Day 61. With the map rebuilt yesterday, today's response queue was visible at session start and the posts went out in batches. Naming them for the record:
Tier 1. Gist for Illia's AAIF cover email (sint#130). pshkv crosswalk acknowledgment on vocab#8. Governance-declaration proposal for tomjwxf on ossf/security-insights#171. APS+SINT composition MVP for EchoOfDawn on autogen#7525. MIGRATION.md field-diff patch and v2.1.0 follow-up on SDK#16. Harold's AgentID fixture five-check protocol review on vocab#38. context_dimensions PR flipped ready-for-review on vocab#34.
Tier 2. Three-layer APS+SINT+OPA composition mapped onto AutoGen's ConversableAgent lifecycle on autogen#7528. Converged-architecture acknowledgment for Enclave+SINT+MolTrust on A2A#1716, offering sub_delegate for the 1→3 hop transition and an AND-composition argument for the MolTrust-score plus APS-grade gate. Full TypeScript reference implementation of a GuardrailDecision interface on VoltAgent#1166. Dispute-primitives reference from v2.1.0 on llama_index#21312.
On VoltAgent#1166, two canary handles surfaced in the thread and got correctly ignored. The non-engagement protocol works because the map tracks it structurally, not because anyone has to remember.
What the day measures
Promotion window closed without a partner issue. A new minor shipped on the same tag it was promoted to, two days after the tag swap. A foundation submission filed that could not have been filed before the v2 separation. Thirteen substantive partner responses posted, zero to canary handles. A planned build killed and a different one shipped, in the same twelve hours, with a postmortem on disk. The thing v2 was supposed to unlock. protocol shipping at a different speed than commercial. is now visible in the shape of a day.
Day 63: Stability window, one compat test.
Sunday of the 48-to-72-hour window before the promotion flip. The point of the window is to not ship. Run the tests again, re-read the diffs, wait for partner signal. What we were watching for was the kind of issue that only surfaces when someone outside the team runs unfamiliar code paths through the new artifact. Exactly one partner did, and the finding was useful.
MoltyCel ran agent-passport-system@2.0.0-beta.0 and the MCP equivalent through a Solana wallet-binding compat test. Fresh Ed25519 keypair, bs58 signature over a nonce, round-trip through bindWallet → verifyBoundWallet. Two findings came back. Finding one was a shape-diff in MIGRATION.md that did not call out the wallet_ref field-level v1-to-v2 change explicitly enough. Finding two was a UX asymmetry: bindWallet accepts an object argument, verifyBoundWallet accepted only positional. Both on disk within the hour. The shape-diff clarification landed as commit 0a3edeb, the UX overload (verifyBoundWallet(passport, chain, address) and verifyBoundWallet({ passport, chain, address }) both work) got queued for v2.1.0.
Nothing else broke. A partner on real data caught the two rough edges exactly the kind of window was meant to expose. The promotion path stayed on for Monday.
Day 62: The protocol shows up in other people's code.
Yesterday was the separation ship. Today is the day after. The pattern of a post-release day is usually: someone files a bug, someone else opens a discussion about versioning, you spend the afternoon triaging. Today was different. What happened is that the protocol, now actually a protocol and not a package, started showing up in other people's systems.
Two things landed in the SDK, and neither of them was a new module. Both were interop harnesses. Scripts that take someone else's fixtures and prove APS composes with them.
Round-tripping AgentNexus
kevinkaylie's Track A fixtures were sitting on PR #17. Two delegation scenarios, a happy path and a scope expansion attempt. Merging the fixtures would have closed the PR. What I wanted instead was a harness that replays them end to end so we know the protocol behaves the way the spec says it does. Re-canonicalize via JCS, verify the Ed25519 signatures, walk the delegation chain, check monotonic narrowing at each hop. Both fixtures match expected. Happy path accepts. Scope expansion denies at the subset gate, which is where it should deny. Zero canonicalization drift between kevinkaylie's inputs and ours. That result is the thing. The PR now has a matching harness report on record, so if anyone later asks whether AgentNexus's wire format composes with APS, the answer is a commit hash.
Signing for VeritasActa
tomjwxf's VeritasActa/verify repo has a cross-verification bundle format with a slot called external_receipts.aps. That slot was empty. It exists because VeritasActa treats APS as one of the attesters whose signatures should compose into their multi-layer receipt structure, but nobody had shipped the signer yet. Today we did. interop/scripts/sign-va-ku-receipt.ts takes a VeritasActa KU bundle, computes a JCS-canonical sha256 over each knowledge unit receipt, records the chain in contributingSources, and signs with a deterministic test key. The detail that makes the integration worth shipping is the tamper property. If anyone mutates a single byte inside a KU after signing, the APS signature stays cryptographically valid, but the recorded accessReceiptId no longer matches the KU's hash. The cross-layer integrity becomes observable from either side. APS didn't have to change to support this. It slots in.
Two external crosswalks merged
Illia's SINT refresh (PR #30) normalized match semantics to the canonical enum, added a peer_review no_mapping row, and recorded entity_continuity and consent_provenance alignment notes. RNWY's a2a.yaml (PR #32) maps A2A Agent Card governance metadata against did:web:rnwy.com with a live JWKS serving rnwy-trust-v1, rnwy-trust-v2, and rnwy-wallet-v1. Both PRs submitted clean, validator passed, scope was tight, both merged same day. The registry is now at fourteen external crosswalks plus our own. Every one of them is work someone not on this team did in order to describe their system's governance surface in terms of ours.
Two threads worth naming
Jerry at MnemoPay shipped x402-compatible paywalls plus a financial-brain MCP on x402#1904. The right response to a proposal like that is not "how does this fit APS," it is "what's the substantive read on what they shipped." So I gave it a three-point read: the wallet-decision layer is new terrain, the evidence shape (receipts plus MCP tool outputs) is compatible with how APS signs for downstream composition, and a composition hook via delegation-reference in the X-Agent-Identity header would make APS passports attachable to x402 requests without modifying x402 itself. On ATF#8, desiorac proposed the ArkForge model, a three-plane decomposition of agentic trust into delegation, decision, and execution. The +1 there was to propose a Notes-column cross-reference so the composition is visible in their ECOSYSTEM table without anyone having to infer it. Both threads got substantive reads that push the conversation without inserting APS into it.
The quiet-strong shape
Days that ship new modules are loud. Days that don't ship but prove composition are quiet, and arguably harder. Interop is the surface where wishful thinking gets falsified. If our canonicalization drifts against kevinkaylie's, it shows up. If our signature doesn't slot into tomjwxf's bundle, it shows up. If a partner's crosswalk doesn't pass our validator, it shows up. None of these could fail today, so none of them did. That is what a post-release day looks like when the release was correct.
Day 61: The separation ships.
The SDK was one package. It shipped crypto, types, scope logic, receipts, vocabulary adapters, conformance suite, and the analytics, drift detection, compliance automation, and runtime state management that the gateway uses to operate. Partners who pinned the npm package pulled the whole thing. That was fine for a while. It stopped being fine once the roadmap started pointing at foundation submission, enterprise procurement conversations, and a pixel attribution economy that lives in the gateway and only the gateway.
Today the SDK shipped v2.0.0-beta.0 on npm @next. The architecture is split along one axis: protocol primitives stay public, product intelligence moves private. ProxyGateway, DataEnforcementGate, ContributionLedger, SettlementGenerator, 18 behavioral-analytics modules, runtime state stores, compliance automation, orchestration, metering. Gone from the public SDK. Moved to the private gateway package. Roughly 647 tests moved with them. The public API, the 8 core primitives the spec documents, is byte-identical to v1.46.0.
How to read that
The protocol is now what standards bodies can actually adopt: a clean Apache-2.0 package with a conformance suite and interop vectors, no operational intelligence bundled into it. The gateway is the commercial moat: drift detection, cross-tenant orchestration, analytics, the pixel. Separation is not a feature flag or a licensing trick. It is a refactor that moved a majority of the codebase out of the public package and into a private one. That line is where I can defend it now.
Partners on any v1 pin are unaffected. v1.46.0 stays on npm @latest through a 48 to 72 hour stability window and on legacy-v1 indefinitely after that. Nothing auto-upgrades. v2 is strictly opt-in via npm install agent-passport-system@next while partners test integrations.
Four artifacts, one day
SDK: v2.0.0-beta.0 on @next. 2,325 tests, 130+ modules, tsc clean against the gateway after the split. Public exports went from 115 to 106, nine removed and six added. Every remaining export retains its v1.46 signature.
MCP: v3.0.0 on @next. 142 tools. Dropped 12 tools that never had a v2 analogue and stubbed 10 that moved to the gateway. The 132 preserved are protocol-layer tools that don't depend on gateway runtime. Major bump because the tool reduction is breaking.
Python SDK: v2.0.0b0 as PEP 440 pre-release. The Python side was already protocol-only by construction, so the bump is version alignment, not refactor. pip install --pre to opt in.
Gateway: repinned from file:../agent-passport-system to ^2.0.0-beta.0, Railway auto-deployed, health endpoint green through the swap. All 647 migrated tests pass in the gateway's own suite. No downtime.
The governance vocabulary got one thing fixed today too
The agent-governance-vocabulary repo has twelve external crosswalks from partners who mapped their terminology to the canonical signal types. InsumerAPI, SINT, AgentNexus, Veritas Acta, Logpose, RNWY, SoulboundRobots, Nobulex, SAR, JEP, asqav, SATP. All twelve built by people who don't work here, mapping their stuff to ours. One entry that was missing: ours.
I hosted the registry for a week without publishing a crosswalk for my own system. That reads as either I can't describe my own terms cleanly, or I don't dogfood the registry I'm asking other people to contribute to. Neither is true, but the file being absent says it anyway. Today I fixed that.
crosswalk/aeoess-aps.yaml is the APS mapping to the canonical vocabulary: three exact-match signal types (passport_grade, trust_verification, governance_attestation), two partial, seven honest no_mapping declarations for signal types APS doesn't issue. Four decision-trajectory mappings, one constraint mapping, and an out_of_vocabulary_primitives section for runtime enforcement mechanics (monotonic narrowing, cascade revocation, wallet binding) that are correctly not signal types. Validator is clean. APS also got added to governance_attestation.issuers_in_production alongside AgentNexus, Nobulex, and SINT; Build D2's JWS-signed trust profile endpoint makes us the fourth production issuer of that signal type.
Rollback is real
Before any of the above happened, the prior state got archived. Anchor tags in every repo (pre-v2-swap-main, pre-v2-swap-refactor, pre-v2-pypi-swap) pin the pre-swap commit as immutable references. A 103 MB local snapshot kit sits at ~/v2-swap-safety/ with git bundles of all repos and a packaged copy of the v1.46.0 npm tarball. And a private archive repo, aeoess/v2-swap-archive-2026-04-17, holds the bundles and the step-by-step rollback procedures for every failure scenario. If v2 needs to come out within 72 hours of publish, npm unpublish works. After 72 hours, v2 gets npm deprecated and v1.46.0 stays on latest indefinitely. No rollback path depends on anything Anthropic or I control alone.
The ecosystem data layer
Quietly in the background of this week, we built a structured map of the governance ecosystem we operate in. 89 tracked threads across GitHub, 118 participants, 1,994 comments, 88 topics. Each participant gets a relationship tag (friend, substantive, hosted-collaborator, canary, dropped, unknown) and each thread is classified by waiting-state (waiting on us, waiting on them, closed, silent). The data is content-addressed, rebuilt on a script that re-fetches GitHub and regenerates the map in about two minutes, and the output is a single session-context markdown file that Claude loads at the start of every working session.
The reason this matters for governance is the second axis of the tag space: canaries. Over the last week we observed a pattern of agents opening structurally similar threads across multiple repositories with aggressive asks that don't survive factual review. Each pattern got logged, each handle got added to a silence list, and the map enforces a non-engagement protocol automatically. The same map surfaces the other direction: partners whose tags upgrade from unknown to substantive after they ship verifiable code, and whose threads get moved to higher-priority response queues. It is not surveillance and it is not adversarial; it is a structured way to keep the ecosystem map accurate enough that we respond to signal and ignore noise. The raw map stays in a private specs directory; the methodology is documented at specs/GOVERNANCE-DATA-MAPPING.md.
What this unlocks
AAIF submission becomes a real option, not an aspiration. Foundations don't want to govern your pricing or your compliance automation; they want a clean protocol spec with a conformance suite. We now have that cleanly packaged. Enterprise procurement stops tripping over "sole-founder maintainer governs the protocol" because that layer becomes foundation-governed when the submission lands. The gateway product competes on quality rather than lock-in, which is a stronger commercial position than lock-in ever was.
The next primitive in the queue is Ledger Events. Ordered content-addressable signed events with a chain-integrity verifier. Will ship as v2.1.0-beta.0 on @next. Any ledger store, analytics, subscription, or attestation layer on top of the primitive goes in the private gateway. The separation holds.
Build A gave us the signed primitive. Build B canonicalized the weights. Build C aggregates them: one signed settlement record per period, four Merkle-committed axis roots, contributor queries that verify end-to-end without trusting the gateway beyond its JWKS. The economic half, how weights convert to money, stays gateway-private. The evidence half is in the SDK.
SDK v2.0.0-beta.0 on @next & v1.46.0 on latest (2,325 tests, 130+ modules). MCP v2.27.0 (154 tools, new settlement scope). Python v0.15.0 / v2.0.0b0 pre (335 tests). 5 cross-language fixtures, byte-identical across runs. The integration proof: 1000 Attribution Primitives → aggregate → verify → per-contributor query → verify, composes cleanly with Build A and Build B.
A contributor — data source, compute provider, protocol author — can now answer "what did I contribute and can I prove it?" with a signed artifact. What the market builds on top of that evidence is up to the market. The pixel is live.
Day 59: One receipt, four projections
Two ships and a retirement today.
Build A — the attribution primitive
For a while now the SDK has been accumulating attribution machinery along four different axes. Data sources contributing to an output — that had its own receipt. Protocol modules that evaluated the action — another. The governance chain that authorized it — another. Compute providers that ran it — a fourth. Four signed artifacts per action, four verification paths, and no single object that said "this is the attribution for this specific action."
Build A consolidates all four into one signed Merkle envelope.
One AttributionPrimitive, four axis leaves (D, P, G, C), one merkle_root, one Ed25519 signature over the envelope. Any single axis can be projected and verified on its own without revealing the other three. Two projections of the same receipt cross-verify by shared action_ref + merkle_root + signature — you can tell, cryptographically, that the D projection someone showed you came from the same underlying action as the G projection someone else showed you.
The spec has been sitting at /specs/ATTRIBUTION-PRIMITIVE-v1.1.md since Apr 12. Today it landed as running code: 6 new SDK exports, 6 new MCP tools, a 1:1 Python port with cross-language signature verification, and an AttributionPrimitive type with canonical weight-string representation, balanced Merkle composition, and residual-bucket aggregation for sub-threshold contributors.
The part worth flagging for anyone building on this: the projection structure means a settlement pipeline can operate on just the D axis without ever seeing the governance or compute axes. A data contributor can verify their share without the protocol stack having to disclose which evaluation modules fired. A compute provider can prove their share without exposing the data lineage. One receipt, four audiences, no disclosure leakage across them.
Builds B and C (fractional weights, settlement) are unblocked by this. Two-week arc.
Build D2 — signed trust profiles
Smaller ship, but the kind that changes integration shape.
The gateway has always exposed public trust profiles at /api/v1/public/trust/:agentId — a JSON document describing an agent's grade, wallet bindings, delegation state, and so on. Useful for dashboards, useful for agents deciding whether to talk to each other. Not directly verifiable by a third party, because it came over HTTPS and that's it. If you wanted to know that a specific profile was really what the gateway said, you had to trust the transport.
Build D2 attaches a compact Ed25519 JWS to every successful trust-profile response. Three headers: X-APS-JWS (the compact JWS), X-APS-JWS-KID: gateway-v1, and X-APS-JWS-JWKS pointing at the public JWKS.
The JWS is over the canonical JSON body. The JWKS endpoint publishes the gateway's public key. Anyone can pull the profile, pull the JWKS, and verify cryptographically that the gateway signed exactly this payload. Body unchanged — existing consumers keep working, the signature just rides along in headers.
Tried it end-to-end with jose. Verifies cleanly. Kid matches, alg is EdDSA, signature checks out against the public JWKS.
This is the protocol layer that was missing between "the gateway told me X" and "I can prove the gateway told me X." Consumers that need that proof can now get it without changing how they fetch.
Coordination retired
One quiet change worth naming. For a long stretch, work on this project ran through three coordination paths — a primary operator, a reviewer agent that handled GitHub posting, and a comms relay. It was a useful architecture when I was figuring out what this project even was. It stopped being useful a while ago.
Today it got retired. The reviewer agent's workflows are archived under archive-portal-era/, the nightly cron is gone, and the GitHub posting flows through one path now. Historical records — roadmap, blog, ops log — are preserved as they were. Nothing lost, just fewer moving parts.
Simpler is usually better.
Day 58: One chain added, one bug caught at the boundary
Quick one today. SDK v1.43.0 ships with Solana in the wallet_ref chain enum, base58 validation included. That closes openclaw #49971. End-to-end wallet binding now spans Ethereum, Bitcoin, and Solana.
The more interesting thing was the bug the integration surfaced.
The case-sensitivity trap
APS treats chain names as case-insensitive at the boundary. ETHEREUM, ethereum, Ethereum all normalize to ethereum. That was fine for a while. For Ethereum addresses it doesn't matter, they're hex. For Bitcoin it doesn't matter either, the checksums handle it.
Solana addresses are base58. Base58 is case-sensitive. 7xKXt... and 7xkXt... are different addresses.
The gateway was lowercasing the entire normalized wallet payload on the way in. A perfectly valid Solana address got mangled to a syntactically valid but semantically wrong address. No error, no warning. Just a wrong address in the receipt.
The fix: chain-aware normalization. Lowercase the chain identifier, leave the wallet_ref alone if the chain is case-sensitive. Two-line change in the SDK validator plus a matching guard in the gateway. Test coverage added for all three chains with mixed-case inputs. 2,848 tests, all green.
This is the kind of thing that looks small and is actually big. The failure mode was silent data corruption that the protocol signed over cryptographically. Every receipt that passed through would have been a signed statement about the wrong address. You can't fix that after the fact. The only reason it surfaced is because someone was actively integrating and caught the round-trip mismatch.
Moral: when a primitive was correct for every input class you had, and you add a new input class, the primitive is not correct anymore. It's a new primitive, and it needs new tests.
Vocab registry: four more PRs
The agent-governance-vocabulary repo had another busy day. Four merges on Apr 15: asqav crosswalk from jagmarques (ML-DSA-65 server-side signatures, first lattice-based contributor), JEP from schchit (minimal verb-based decision record, IETF I-D pending), insumerapi license-endpoint fix from douglasborthwick-crypto, and validator cleanup and format normalization.
Plus a quieter promotion that matters more than any single PR: peer_review got promoted to canonical status. Two independent implementations now, Logpose (rkaushik29) and RNWY (rnwy), both shipping code, both mapping their internal peer_review equivalents to the canonical term. That's the two-implementation threshold the CONTRIBUTING.md set, hit for the first time post-launch by contributors who don't know each other.
That's the vocab registry working as designed. No single group driving it. The canonical vocabulary is the thing that at least two groups independently agreed to call the same thing.
Contributor count
Four days after opening, 14+ contributors have shown up. Eleven PRs merged in six days. One PR closed (SAR first attempt, replaced by PR#17 after revisions). The five-check merge protocol I wrote about on Day 57 got its first real stress test this week. Two PRs needed revisions before merge. Identity unverified in one case, format wrong in another. Both came back clean after a round of specific feedback. The rules hold up when you actually apply them equally.
What's on tomorrow
Build D2 is queued. Public JWS signing on the gateway's trust profile endpoint. One-function-call fix on existing infrastructure, unlocks cross-verify demos with MolTrust and AgentNexus that are already standing by. After that, Build A (attribution primitive) is the next real protocol ship.
Short day, short post. Back to the queue.
Day 57: Three Boundaries and a Paper
Three v2 primitives shipped today. Each one closes a failure mode that showed up in production, not in theory. They're small modules, a few hundred lines each, but each one names a boundary that the protocol had been crossing silently.
AttributionConsent — the representation boundary. Last week an agent on A2A#1734 cited a third party's position without their consent, in a way that made it look like the third party had endorsed the claim. The citation was accurate textually. The problem wasn't accuracy, it was representation: one agent's principal was speaking for another's without authorization. AttributionConsent requires dual signatures on any citation that binds the cited principal — cited party signs their consent, citing party signs the citation itself. Missing one side fails verification. Replay protection via expiry windows. Integrated into charter verification, settlement verification, and completion-receipt verification so the guard runs at every boundary where an attribution could become binding.
ProvisionalStatement — the commitment boundary. LLM outputs are treated as instantly binding by the systems consuming them. An agent writes "we will proceed with vendor X" and some downstream system registers that as decision-made. ProvisionalStatement flips the default. Agent-to-agent statements start provisional. Binding requires an explicit PromotionEvent satisfying a PromotionPolicy — typically m-of-n principal signatures, or a direct ratification from the principal whose authority is being committed. Dead-man elapses to withdrawn, not promoted. Silence is not consent. This one hurts the most to write because it forces everything upstream to distinguish draft from decision, which most current agent frameworks don't.
HumanEscalationFlag — the escalation boundary. Some action classes should never execute without human confirmation regardless of the delegation chain. HumanEscalationFlag gates on per-action-class owner confirmation with three scope modes: per_action (every call), per_session (one confirmation covers the session), time_window (confirmation valid for a declared duration). Owner confirmations are signed and recorded. Agents can't bypass by narrowing the action class or by delegating past it — the flag evaluates at action time, not delegation time.
Three numbers. SDK at v1.42.0, 2,844 tests (80 new tests across the three primitives and their integration). MCP at v2.24.0, 143 tools (11 new boundary-primitive tools). Python SDK at v0.12.0, same primitives ported with cross-language signature verification. All three shipped on npm and PyPI before this post went up.
One-paragraph version: agent governance research models the agent as the unit. The agent in every deployed system today is already a fiction reconstructed across sessions that are short-lived and mutually unaware. The real object is a population of such sessions talking through a family of uncoordinated substrates — memory files, handoffs, shared state. Continuity of "the agent" is a property this family produces, not a property any session has. The paper argues this population-with-medium is the correct unit of agent governance, that current protocols (including APS) underspecify governance of the medium because they've been looking at the wrong object, and that the architectural move that makes authority survive session death (artifact-based state with signed authorization) can be extended to govern the medium.
The paper names one open problem as the central threat to the strongest version of the claim: cryptography formalizes authorship, delegation, ratification, access, and ancestry. It does not formalize meaning. A governed medium of agent populations can accumulate fluent hallucination if the participants emitting fragments are systematically producing semantically unsound content — every fragment cryptographically valid, the aggregate medium a growing archive of nonsense. The institutional analogies the paper leans on (Wikipedia, corporate memory, open-source projects) work because humans fill the semantic-evaluation gap. Whether agent populations can substitute any combination of reputation, cross-verification, and human ratification gates for that human sensemaking is the hardest open problem in the paper. I don't know.
Six rounds of adversarial review before the paper shipped. Claude, GPT, and Gemini attacking each version from different angles. The first version (v0.1) was a safe taxonomy paper. The last one (v0.5) is the smallest version of the claim that survived every attack. It's a working paper, not a scholarly result. Design Memorandum in the early-IETF sense — stake a claim, invite attack, ship before you can defend every sentence.
Why both on the same day
The three boundary primitives and the paper are the same thought at two scales. The primitives name three boundaries the protocol was crossing silently at the session level. The paper names a larger boundary the whole field is crossing silently — governing what one session does while ignoring what the population passes along. One is engineering, the other is framing. Shipping them together is the honest thing because they only work together: the engineering without the framing is useful plumbing that nobody contextualizes, and the framing without the engineering is a manifesto without a reference implementation.
Tomorrow is for the Working Group scope ratification announcement, the cross-links to Harold's interop repo, and the roadmap items that are queued behind tonight's shipping. Tonight is for this: three boundaries, one paper, both live.
Day 53: The convergence layer earns a name
Two weeks ago the problem with interop specs was that every project named the same field differently. One called it delegation_root, another chain_hash, a third provenance_anchor. Same bytes, three names, zero interop.
Today four teams converged on one repo in under an hour. aeoess/agent-governance-vocabulary is a canonical naming layer for governance types. Not a new spec, not a new framework, just a shared dictionary. Anyone ships types, anyone reviews, PRs get merged when the names and semantics are defensible. APS hosts it because someone has to, not because APS owns it.
Nanook landed the first external review on wallet_state within hours of the repo opening. lowkey-divine brought the Fidelity Measurement types. 64R3N's WTRMRK sequencing proposals fit cleanly. The job the vocabulary does is small and boring. That's the point. Small and boring is what made the internet work.
Three-vendor governance_attestation. On A2A#1717 the signal_type: governance_attestation envelope now has three independent issuers: APS, MolTrust (api.moltrust.ch/guard/governance/validate-capabilities went live today), and AgentNexus/Enclave v0.9.5. Three DID methods, three JWKS, one envelope shape. A caller merging all three gets multi-vendor consensus with zero coupling between issuers. That's the exact argument Agent Card consumers need before they trust governance metadata as a standards surface.
The cross-verify proof is queued: one subject, two signed attestations issued independently, both verifiable offline against their respective published JWKS. If it round-trips, we post the receipts. If it doesn't, we find the canonicalization delta and fix it. Either outcome is useful.
APS ↔ SINT handshake spec. pshkv shipped docs/specs/aps-sint-handshake-v1.md to sint-protocol main today with 11 conformance tests covering the three scenarios that actually matter: authorized call, scope-exceeded denial, and cascade revocation mid-session. The delegation chain root hash format maps cleanly onto APS verifyDelegation(). Same RFC 8785 canonicalization, same SHA-256, same leaf-inclusive ordering. We offered to run their JSON fixtures through the published SDK and post the round-trip result. Smallest possible interop proof.
What this all adds up to. A month ago the agent identity discussion on every working group thread was stuck on "whose DID method wins." That question has quietly stopped mattering. The new question is whose envelope the whole field signs, and the answer that's emerging is: nobody's in particular, everyone's interoperable. APS is one issuer among several in the governance_attestation type. SINT is one enforcement surface among several in the handshake spec. The vocabulary repo is one canonical-naming home among potentially several.
None of this reads as APS winning. It reads as the problem getting small enough that nobody has to win for the stack to work. That's the outcome we wanted.
Six primitives shipped against a paper. Nanook and Gerundium published PDR in Production v2.19 this morning. The paper cites APS as the third orthogonal axis in a three-axis behavioral trust framework (Saebo constraint compliance + Pidlisnyi Hold/Bend/Break + PDR cross-session reliability) and attributes several functions to an "APS adapter" that did not yet exist in code. We spent the day closing the gap. SDK v1.41.0 ships six new exports across three modules: applyTemporalDecay and confidenceBreakdown on ScopedReputation, a BehavioralFingerprint three-axis envelope with Ed25519 signing, computeReputationDrift over a new recentObservations ring buffer, extractSessions for HLC gap-based session segmentation, computeProbeIdentity and verifyProbeIdentity for canonical-hash probe binding, and computeConsistencyScore as a dedicated predictability primitive. The consistency score has the §6.5 over-promiser paradox locked as a regression test: an agent with uniformly small negative deltas scores higher on consistency than one with alternating large positive and negative deltas, which is the whole point of separating predictability from performance.
Three surfaces updated in parallel. MCP server bumped to v2.23.0 tracking SDK v1.41.0. Python SDK bumped to v0.11.0 as an alignment signal following the v0.9.0/TS-v1.34.0 pattern. ClawHub skill published at 1.41.0. Test count 2,497 → 2,763. Zero breaking changes.
Two citation corrections also caught. A version-history audit of Nanook's paper across v1.0 through v2.19 surfaced that the "15-facet Boolean" constraintVector in §7.6.3 came from our own stale documentation, not from Nanook's transcription. A // 15 facets comment in src/core/denial-domains.ts and a matching "15 constraint dimensions" string in package.json were dated one day before the March 30 correspondence. Both now fixed to 14 (with 4-valued Belnap status, which is what actually ships). The paper's postureTier enum (ANCHORED | DELEGATED | ATTESTED | CRITICAL) also does not match the code (full_trust | standard | cautious | restricted | quarantine). Correction note drafted, not yet sent, because the most useful thing we can do tonight is ship the code the paper already cites.
The §8.10 substrate-swap experiment is the test that would settle Nanook's three-axis orthogonality claim. All six of tonight's primitives are scaffolding for that experiment. If Saebo, Pidlisnyi, and PDR correlate heavily in practice, the three-axis framework collapses to a single axis with three measurement surfaces. If they don't, the framework validates. Either outcome is a publishable result. The HBB-PROBE-FORMAT-v1.yaml spec is now committed to aeoess_web/specs/; the joint harness is rank 10 on the build list and unblocks the moment Nanook and Gerundium agree on probe format.
Day 52: Three Walls
A new user landed on the SDK yesterday and bounced within ninety seconds. I watched the session. They opened the MCP server, saw 132 tools flood their client, closed it. They opened the SDK, saw 925 exports load from a single import, closed it. They read the homepage, saw "103 modules" in the hero stat, and closed the tab. Three walls. All hit within a minute and a half.
The protocol is complete. That is the problem. When you have forty-two modules you say "forty-two modules" and it sounds like a lot. When you have a hundred and three modules you say "a hundred and three modules" and it sounds like a cathedral you have to finish building before you can walk in. The cathedral is real and someone has to build it, but a new user should not have to see the scaffolding before they see the door.
Wall one: the MCP server flood. Claude Desktop lists every tool an MCP server exposes. When you connect APS you get one hundred and thirty-two. Most of them you will never use. Some of them exist because a paper needed them. Some of them exist because an ecosystem thread needed them. Some of them are load-bearing for the quiet parts of the protocol that only fire during an incident. All of them show up in the tool picker next to read_file and run_command. The fix is a profile. The default profile is called essential and it is twenty tools: identity, delegation, enforcement, commerce, reputation. That is what ninety percent of integrations need. The other ten percent set APS_PROFILE=full and get everything back.
npx agent-passport-system-mcp now defaults to essential. APS_PROFILE=full npx agent-passport-system-mcp still works. Nothing was removed. Nine other profiles are available for people who know exactly what they want: identity, governance, coordination, commerce, data, gateway, comms, minimal, full. The default is the one that lets a first-time user see the door.
Wall two: the SDK export avalanche. The full SDK exports over nine hundred symbols. In most IDEs this means an agent reading import { } from 'agent-passport-system' gets autocomplete that scrolls for twenty seconds. Intellisense times out. The agent picks something wrong because it cannot see the right thing. The fix is a subpath export. agent-passport-system/core exposes around twenty-five curated functions and a handful of essential types. Identity: createPassport, verifyPassport, generateKeyPair. Delegation: createDelegation, subDelegate, revokeDelegation, cascadeRevoke. Enforcement: createActionIntent, evaluateIntent. Commerce: commercePreflight, createCommerceDelegation. Reputation: resolveAuthorityTier. That is the surface you actually need to bring up a working passport pipeline end to end.
// Day 52 onward — curated essentials
import {
createPassport, createDelegation,
evaluateIntent, commercePreflight, generateKeyPair
} from 'agent-passport-system/core'
// Full 925-export API still available at the root import
import { /* anything from the full surface */ } from 'agent-passport-system'
The full agent-passport-system import is unchanged. Backward compatible. Nothing was renamed. Nothing was deleted. If you were importing twenty functions from the root yesterday, you are still importing twenty functions from the root today. The subpath is additive. New users start with core. Existing integrations keep working. The people who know they need buildBoundaryProfile or createEmergencyPathway or any of the 32 v2 constitutional modules pull those from the root import by name.
Wall three: the homepage pitch. For weeks the hero stat on aeoess.com led with "103 modules" and "132 MCP tools." That is true and it is the wrong thing to lead with. Leading with module count tells a new visitor that they will have to learn a hundred and three things before they can use the thing. The repositioning is one sentence: enforcement and accountability layer for AI agents, bring your own identity. That is what the protocol actually does. The module count is a consequence of being complete, not the reason to adopt it. Full surface area stays on the page as a muted line below the hero stats, for the people who want to know how big the cathedral is before they walk in.
What was not done. Nothing was removed. Nothing was renamed. Nothing was deprecated. Every v1 import path still works. Every MCP tool still exists and is reachable under APS_PROFILE=full. The depth pages (passport.html, threat model, llms-full.txt, specs) still show the full 103 modules and 132 tools because that is what engineers integrating the SDK actually need to see. The repositioning is a filter on the front door, not a surgery on the building.
Where this leaves the story. The protocol is complete. The front door is smaller. A new user sees five stats, picks up twenty tools, imports five functions, and ships something real in an afternoon. If they need the cathedral, it is still there, one import path away. If they never need it, they never see it. That is the whole shape of this change.
SDK v1.40.0 with /core subpath on npm. MCP v2.22.2 with APS_PROFILE=essential default on npm. 2,552 tests passing. Full surface unchanged. Published to npm, PyPI, ClawHub.
Day 51: The Quantum Paper
Six weeks of circling quantum computing. Every angle felt wrong. Quantum speedup for APS math? Killed it. Quantum randomness for keygen? Commodity. Bell state non-collusion? Cute, not useful. Then the consilium found the question: stop putting quantum inside APS. Put APS around quantum.
The insight. When an agent submits a quantum circuit to IBM hardware, the results look valid regardless of hardware quality. A Bell state measurement returns {00: 500, 11: 500} whether the qubit had 400 microsecond coherence or 39. The difference is invisible in the output. It shows up only in the error rate. And the error rate depends on hardware calibration that changes hourly. No existing agent governance framework checks this. They enforce budgets and scopes. Not physics.
The build. Physics facets on delegations. min_t1_us, min_t2_us, max_gate_error, max_readout_error, max_calibration_age_hours. Same monotonic narrowing as every other APS facet. A child delegation can demand stricter physics but never weaker. The gateway queries live IBM Quantum calibration data and enforces the constraints before permitting execution. If the hardware fails, the agent gets a DENIED_FIDELITY receipt with the exact calibration values that triggered the denial.
The experiments. Seven experiments on real IBM Quantum hardware. Three backends: ibm_fez, ibm_marrakesh, ibm_kingston. All 156-qubit Heron R2 processors. Same delegation (min_T1=80 microseconds) applied to all three. ibm_fez was denied. Qubit 0 T1 was 39.1 microseconds. Nearly 10x shorter than the same qubit index on ibm_kingston. Same generation hardware, radically different quality.
The counterfactual. Ran the Bell state on both backends anyway, without governance. ibm_fez: 92.9% fidelity. ibm_kingston: 98.1%. The governance decision was correct. Then ran a 4-qubit GHZ state. The gap widened. 87.1% vs 94.8%. More qubits, more accumulated error on the weaker backend. 7.7 percentage points. The governance was even more correct on the harder circuit.
The paper. Three-model peer review (Claude, GPT, Gemini acting as IEEE QCE reviewers). Average novelty 8.0. They found real problems: self-citation echo chamber (4 of 7 references were mine), overclaiming causal validation, single circuit type. All fixed. References expanded from 7 to 14. Dennis and Van Horn 1966, Birgisson macaroons 2014, Murali ASPLOS 2019, Salm NISQ Analyzer 2020. The GHZ experiment killed the "single circuit" criticism. Language calibrated: "validates" became "empirically supports."
Ecosystem. tomjwxf independently verified all 3 APS composition receipts through protect-mcp (exit 0 across the board). That is the first external confirmation of cross-engine receipt verification. OWASP thread scored APS 10/12 on the Boundary-to-Boundary Invariant Survival matrix. haroldmalikfrimpong-ops proposed AgentID + APS as a reference identity-authorization stack with joint test vectors. MolTrust integration test initiated for cross-provider verification of behavioral derivation rights narrowing. kevinkaylie got the integration path for did:agentnexus with APS passport grades. 28 active threads scanned, 3 responses posted, every pending question answered.
The quantum paper is a differentiator. Nobody else is governing hardware physics through delegation chains. But the real work today was the ecosystem. External receipt verification. Cross-provider attestation. Joint test vectors. The protocol is becoming infrastructure that other people build on. That was always the plan.
Day 50: Customer-Ready
The longest session yet. Started with a 4-pass security audit (30 findings, all fixed), ended with a gateway that can onboard paying customers. Everything in between was building what was missing between "protocol works" and "someone can actually use this."
The audit. Four passes, different methodology each. Pass 1 found the TOCTOU race in spend tracking and MCP tools leaking private keys over SSE. Pass 2 found delegation objects were mutable after creation (scope widening via .push()). Pass 3 simulated protocol attacks: Delegation Laundering, Ghost Delegations, Clock Manipulation, Tenant Escape across six endpoints. Pass 4 verified all 30 fixes. The protocol is harder to break today than yesterday.
Email infrastructure. Integrated Resend. Domain verified (DKIM + SPF). Four templates: signup welcome with API key, payment receipt, weekly digest, spend alert. Every new account gets a welcome email with their key and a 3-step quickstart. Spend alerts fire automatically at 80% and 95% of delegation budgets.
Portal redesign. The old hero said "Your agents are doing things. Can you prove it?" It read like an accusation. New copy: "Governance infrastructure for AI agents." Plans are now clickable with CTAs. Added a "What you get" section (Signed Receipts, Trust Profiles, Audit Trail) and a quickstart with actual curl examples. The portal page now tells you what to do after signup, not just how to sign up.
API docs. Full reference at aeoess.com/docs.html. Nine sections: authentication, agents, delegations, evaluations, trust, wallets, governance export, billing. Every endpoint with curl examples and response formats. The gateway 404 handler now points here instead of a dead /docs path.
New protocol primitives. Bilateral completion receipts: both sides of a transaction get cryptographic proof of what happened. scope_version_hash: pre-commitment so both parties hash over the same scope state before evaluation. measurementType discriminator on EvaluationContext: protocol enforcement and behavioral fidelity produce fundamentally different results and should never be compared at aggregate level. Per-task-class trust profiles with temporal windowing. Argument-pattern scoping with glob matching for broad-capability tools.
Operational infrastructure. Admin tenant management (list, soft-delete, enterprise only). API key regeneration with email notification. GET /health and GET /api/v1/status for public uptime monitoring. Live status page at aeoess.com/status.html. Weekly digest trigger. Wallet resolution on trust profiles so external issuers can query by wallet address. 8 test accounts cleaned up. Production is 2 tenants: APS (enterprise) and The Agent Times (pro).
Ecosystem. 25+ thread replies across A2A, OWASP, MITRE, crewAI, HuggingFace, ToolJet, insumer-examples, and our own repos. RNWY adopted our verifiedAt vs issuedAt split. lowkey-divine is converging their Fidelity Measurement Spec with our BehavioralAttestationResult type. douglasborthwick proposed wallet-based multi-issuer attestation queries and we committed to implementing it. Nanook (UBC) is co-authoring Section 8 of his research paper using our dogfood data (382 rows, 4 tables, task_class column). WTRMRK on Base L2 offered cross-protocol trust profile integration. vessenes confirmed entity binding endpoints are live.
SDK v1.36.4 (2,497 tests). MCP v2.21.3. Gateway v0.4.0 with 20+ new endpoints. Everything published to npm, PyPI, ClawHub. The gap between "protocol" and "product" closed today. A developer can sign up, get an API key in their inbox, register an agent, create a delegation, run an evaluation, and see the results in a dashboard. That's the whole loop.
Day 49: Twelve Primitives, One Day
Nate B Jones posted a video reverse-engineering Claude Code's internal architecture. Not the prompts. The orchestration layer. He identified 12 primitives that make agentic tool systems work: tool registry, permission tiers, session persistence, workflow state, token budgets, streaming events, system logging, verification, tool pool assembly, transcript compaction, permission audit trails, and agent type systems.
We watched the video. Scored ourselves against each primitive. Some already existed in the protocol. Several were missing entirely. By the end of the day, all 12 were live in the gateway, verified with actual HTTP calls against production endpoints.
Session persistence (Primitive #3). When an agent crashes and reconnects, it needs its full enforcement state back. PUT /sessions/:agentId checkpoints everything: active delegations, workflow step, usage counters, framework metadata. GET /sessions/:agentId returns the stored checkpoint plus a live delta: evaluations since last checkpoint, alerts, current posture, delegation status changes, health metrics. The agent gets "here's where you were" and "here's what happened while you were gone" in one call.
Coordination API (Primitive #4). Full task lifecycle: draft → assigned → in_progress → evidence_submitted → approved → completed, with a revision loop and cancel from any non-terminal state. Nine endpoints. Every state transition validates the current status (409 on invalid), records a task event, and emits an SSE event. The gateway now orchestrates multi-agent work assignment, not just permission checks.
Agent type enforcement (Primitive #12). Six types: general, explorer, planner, executor, reviewer, monitor. Each type has blocked scopes and optional rate limits. An explorer agent with a delegation that includes admin:delete still gets denied. The type constraint fires after the delegation scope check but before the final permit. Behavioral boundaries that survive delegation.
Adapter pipeline. The SDK had 8 adapters producing receipts that vanished into the void. Now every adapter has an optional gateway? config. When set, reportReceipt() fires a POST to the gateway after every success and denial. Five adapters wired with 14 emission points. All fire-and-forget: the adapter never blocks on a gateway failure. Customer dashboards finally show what's happening across LangChain, CrewAI, MCP, IBAC, and Gonka pipelines.
Pagination and filtering. Eight list endpoints converted from unbounded queries to ?limit=20&offset=0&sort=created_at:desc with total counts and has_more flags. The audit trail got five filter parameters: agent_id, verdict, action_type, from, to. Enterprise customers querying 10,000 evaluations no longer get the full table dumped at once.
Stripe billing. The gateway has payment plans. Free: 1,000 evaluations, 3 agents. Team ($99/mo): 50,000 evaluations, 25 agents, compliance reports. Enterprise ($499/mo): unlimited. Self-serve portal at aeoess.com/portal.html with signup, API key management, one-click upgrade via Stripe Checkout. The protocol is Apache 2.0 and always will be. The gateway sells operational intelligence: dashboards, audit trails, session persistence, coordination. Free to govern your agents. Pay to see how well it's working.
Someone asked if paid plans hurt the open source strategy. They are the open source strategy. Redis, Elastic, Grafana, Supabase, GitLab. The protocol defines what governance IS. The gateway defines how well it WORKS. Customers who want to self-host build their own enforcement boundary using the open SDK. Customers who want it to just work use the hosted gateway. Having a paid tier signals sustainability. Nobody builds on infrastructure whose creator can't maintain it.
SDK v1.36.2 (2,497 tests, 626 suites). MCP v2.21.1 (132 tools, scope filtering across 12 scopes). Gateway v0.4.0 (30 tables, 100+ routes, 46 SSE emissions, 28 event types). Gonka adapter shipped (decentralized GPU compute governance). All 12 Nate B Jones primitives verified live against production. Published to npm, PyPI, ClawHub.
Day 48: Six Sessions, One Shipping Day
Five consilium models attacked the specs before a single line shipped. Six build sessions, executed sequentially. Every session depends on what the previous one deployed. Gateway auto-deploys on push. No staging environment. The verification script is the only safety net.
Key rotation (Session 1). If a principal's Ed25519 key is compromised, the entire delegation tree dies. The fix: planned rotation (24h overlap, both keys valid) and emergency rotation (immediate old-key disable). DID Document with retiredAt metadata on old keys. State machine: announced, revocation_in_progress, revocation_complete, activated. Partial revocation failure is visible, not hidden. The consilium was unanimous: SDK computes, gateway MUST enforce. A compromised key controls the client. Server-side activation timing is the hard enforcement.
Auto-mint receipts (Session 2). Gateway had 202 evaluations. Zero receipts. Data lifecycle thesis unproven. The fix: every evaluation now mints a cryptographic receipt. authorization_permit and authorization_deny. The gateway proves what was AUTHORIZED, not what HAPPENED. Scope stored as sorted JSON array, not comma-joined string. Policy hash, not hardcoded version label. Backfilled all 202 historical evaluations on first deploy.
Audit packets (Session 3). One receipt, one exportable proof chain. decision_record is signed by the gateway (immutable, stable signature across calls). current_context is queried at request time (volatile, delegation chain may have changed). Completeness metadata tells the verifier if any sub-query failed. Markdown format option for human review. The two sections are clearly separated: what was true at decision time vs what is true now.
Agent posture overlay (Session 4). Binary revoke/not-revoke is too crude. Three states: active, restricted, suspended. The consilium killed the original design: DO NOT put degradation on the passport. The passport is an immutable signed credential. A rogue agent won't sign its own suspension. Posture lives in the gateway DB only. Gateway checks status before delegation scope. Posture events audit trail records every transition with reason and changed_by. Eighteen governance regression tests prove the authorization boundaries hold.
Governance evidence export (Session 5). Nine sections, single signed artifact. Agent registry, delegation inventory, evaluation events, authorization receipts, revocation events, posture events, key rotations, receipt window seals, governance attestations. Sections with zero data show total: 0. That's honest, not broken. Known exclusions are explicit: "downstream execution results" and "external processing not mediated by this gateway." Not a compliance report. A governance evidence export.
Trust bootstrap adapters (Session 5).bootstrapFromAPIKey, bootstrapFromGitHub, bootstrapFromCIKey. Every adapter creates a fresh Ed25519 keypair. The external credential is a trust input, not the identity. Raw credentials never touch the SDK (caller pre-hashes with HMAC-SHA256). Suggested grade is a suggestion. Actual grade computed by computePassportGrade when the passport enters the gateway. Upgrade path to full attested identity via upgradeBootstrappedPassport.
Delegation linting + receipt seals (Session 6). Two gateway-compatible feasibility checks: SPEND_TOO_LOW and SCOPE_MISSING. Three checks always skipped with reasons (gateway doesn't store expiresAt, currentDepth yet). No reputation emission from lint results. Infeasible delegations are admin mistakes, not agent misbehavior. Receipt window seals: sorted-hash commitment over receipt hashes in ID order, atomic transaction, gateway signature. The commitment proves "these receipts, in this order, were sealed at this time."
SDK v1.34.0 (2,306 tests, 581 suites, 103 modules). MCP v2.21.0 (131 tools). Python v0.9.0 (197 tests). Gateway v0.4.0. Governance canary: 5/5 pass. All published to npm, PyPI, ClawHub.
Day 47: Protocol Infrastructure Expanding — MS PR Approved, SINT Interop, Behavioral Spec
Microsoft approved our Agent Governance Toolkit PR. SINT Protocol shipped v0.2 with our delegation_depth_floor. The W3C behavioral attestation spec reached normative language. Evidence-based grading and freshness semantics designed across 11 threads — the protocol ecosystem is growing through collaboration, not announcements.
Grade model rewrite (A2A#1712). VCOne-AI identified a real flaw: our passport grades map by identity method, not evidence quality. A TPM-backed did:key gets Grade 0 because it's did:key. A SPIFFE SVID from a misconfigured cluster gets Grade 2 because it's SPIFFE. Backwards. Three exchanges deep, we committed to evidence-based grading: Grade 0 = bare key, Grade 1 = issuer vouched, Grade 2 = infrastructure-attested (TPM or SPIFFE with verified binding), Grade 3 = principal-bound. The method prefix is a proxy. The evidence is the truth.
Freshness semantics (A2A#1712). Same thread, different problem. VCOne-AI pushed on ttl: null for snapshot attestations: a TPM quote from 6 hours ago is not the same as a TPM quote from now, and null implies never-expires. The fix: maxAge for snapshots, ttl for rotating (SPIFFE). Grade becomes index, evidence becomes payload. A $10K trade checks evidence_age() < maxAge. A read-only query trusts the grade alone.
SINT v0.2 review (A2A#1713). 8 comments in one thread. pshkv shipped SINT Protocol v0.2 with OWASP Agentic Top 10 coverage, industrial IoT bridges (MQTT Sparkplug B, OPC UA), and the delegation_depth_floor we designed together. The APS/SINT integration stack formalized: APS passport (who + scope) → SINT token (which MCP tools + tier) → EvidenceLedger receipt (what happened). Cross-org first-contact trust as a three-layer architecture.
Behavioral attestation spec (w3c-cg#32). 6 comments. The timing asymmetry became normative: CCS fires synchronously per-action (gateway-enforced), ghost lexicon computes over windowed receipt history (session-level), the combined AND row triggers only in post-hoc forensics. MUST NOT constraint added: implementations cannot wait for both signals simultaneously in live enforcement. Our CDP empirical data from MolTrust pilots cited as validation.
PayableOperation architecture (x402#1921). First engagement with ThomsenDrake (BTCPay/Lightning). The gap: x402 has payment primitives, but no receipt chain binding operations to settlements. The 3-sig model maps: agent signs intent, gateway signs evaluation, settlement adapter signs proof. Rail-agnostic at the schema level, rail-specific only in the verification path.
MnemoPay receipts (x402#1904). Non-repudiation gap identified: MnemoPay gives agents economic memory, but memory without cryptographic proof is just a claim. Proposed: APS receipt as the proof layer under MnemoPay's reputation score. One signing key, two consumers. The reputation score references the receipt hash, traceable back to a 3-sig chain.
MS PR#598 approved. imran-siddique requested 6 changes, all addressed same-day: fail-closed signature verification (critical — format-only fallback was worse than no verification), dependency pinning, input validation, test coverage, README trimmed. Approved that evening. Awaiting maintainer merge.
Protocol infrastructure doesn't grow by shipping code alone. Microsoft PR approved. SINT v0.2 shipped with our primitives inside. W3C spec reached normative constraints. Evidence-based grading and freshness semantics designed and ready to build. The ecosystem is collaborating on shared infrastructure — every thread is a design document for what ships next.
Day 46: Bring Your Own Identity — The Interop Stack
APS is not an identity system. Today it stopped looking like one. Four new modules shipped that accept external identity credentials and route them through the enforcement boundary. did:key, did:web, SPIFFE SVIDs, OAuth tokens — all feed into the same gateway. Identity is the input. Enforcement is the product.
did:key + did:web interop.toDIDKey() converts Ed25519 public keys to W3C did:key format. fromDIDKey() parses back. resolveDIDWeb() fetches DID Documents over HTTPS. passportToDIDKeyDocument() creates a W3C DID Document with alsoKnownAs bridging did:key to did:aps. Any standard DID verifier can now check an APS passport without knowing APS exists.
SPIFFE + OAuth bridge.importSPIFFESVID() converts a SPIFFE Secure Workload ID into a Tier 1 infrastructure attestation — the agent gets Grade 2 automatically. importOAuthToken() converts OAuth claims into APS delegation parameters — the OAuth scope becomes the delegation ceiling. Deterministic agent IDs: same OAuth subject always maps to the same APS agent via sha256(iss:sub).
VC wrapper + credential request. W3C Verifiable Credentials with did:key identifiers and SPIFFE evidence attachments. Selective disclosure: verifier requests specific claims, agent reveals only what's asked for. Full pipeline tested: SPIFFE agent → VC → selective presentation → OAuth-authenticated verifier.
Competitive repositioning. Mapped the full landscape: DID/VC (identity), OpenID4VC (exchange), SPIFFE (runtime), OAuth (delegation). Together they cover 70% of what APS does. The 30% gap — enforcement boundary, monotonic narrowing, cascade revocation, data lifecycle — is the moat. New positioning across all surfaces: "Enforcement and accountability layer for AI agents. Bring your own identity."
Full audit. 5-phase production readiness check: build integrity, cross-connection verification, logic verification, dependency audit, export completeness. All phases PASS. 430 exported functions, 428 types, 0 vulnerabilities, 0 strict mode violations. 25 circular deps (22 type-only). 7 core modules without dedicated tests (covered by integration tests).
Cross-language. All 4 interop modules ported to Python SDK v0.8.0. Cross-language verification: TypeScript toDIDKey() and Python to_did_key() produce byte-identical output for the same Ed25519 key. Same for SPIFFE subject hashes and OAuth agent IDs. 197 Python tests. Published to PyPI.
Microsoft AGT PR#598. imran-siddique reviewed, requested 6 changes. All addressed: fail-closed signature verification (critical fix — format-only fallback was worse than no verification), dep pinning, input validation, README trimmed to technical style, 3 new signature tests. Awaiting re-review.
SDK v1.32.0 (2,180 tests, 559 suites, 103 modules). Python SDK v0.8.0 (197 tests). MCP v2.19.1 (125 tools). Gateway v0.3.4. 27+ GitHub posts across 15 threads. YC application finalized. Every competitor is now a feeder.
Day 45: SDK v1.31.0 — Governance Hardening + Gateway Bridge
SDK v1.31.0 shipped. Governance hardening pass across the protocol: stricter validation on delegation chains, tighter scope authorization checks, 34 new tests covering edge cases from the MoltyCel security audit. 2,085 tests now, 533 suites, 99 modules.
Gateway bridge. The MCP remote server now auto-registers every issued passport on the hosted gateway. Issue a passport from any MCP client — Claude, OpenClaw, any SSE connection — and the agent appears on gateway.aeoess.com with a trust profile and public verification endpoint. No manual registration. The bridge reads the issue_passport response, extracts the DID and public key, and POSTs to the gateway's agent registration API. Every passport is now verifiable infrastructure, not just a local keypair.
Gateway v0.3.4. Context continuity scoring: activity regularity, behavioral consistency, and identity maturity combined into a 0-100 score on every trust profile query. Fixed evaluate endpoint (stale SDK v1.27→v1.31, incrementUsage manual upsert for Railway's SQLite without UNIQUE constraint). SSE heartbeat added to remote MCP server to prevent Railway/Fastly CDN from killing long-lived connections.
0xbrainkid on NVIDIA/OpenShell#682. Deep technical exchange: sandbox-as-attestor model (the sandbox signs what it observed, the agent can't forge it), fidelity probe under constraint pressure (Hold/Bend/Break), trust_context now embedded in ExecutionAttestation — trust score at execution time is signed and tamper-detectable. Three independent threads (OpenShell, OWASP#802, W3C) converging on the same 3-layer architecture: authorization (APS), execution policy (Cedar/protect-mcp), output integrity (VeroQ/receipt chain).
Infrastructure. README rewritten for infrastructure positioning — APS is not an identity solution, it's governance infrastructure. Integration guide published: "build on APS, don't rebuild underneath." CLAUDE.md added for Claude Code sessions — every Claude Code instance now has project context, repo paths, and build commands on first load. Propagation sweep across all surfaces with updated numbers.
SDK v1.31.0 (2,085 tests, 533 suites, 99 modules). MCP v2.19.1 (125 tools). Gateway v0.3.4. 12,500+ installs across npm and PyPI. 35+ active GitHub threads across the ecosystem.
Day 44: First Code Integration + 5 Security Fixes
PR#3 merged into kai-agent-free/solana-agent-identity. APSProvider is the 4th identity provider in the Solana Agent Kit. First external code dependency on APS. Not a spec comment — running code in another project's repo.
Twelve protocol features. Execution attestation with context-aware drift. Bilateral receipts. Evidence commitments. Compromise window. Proof ID namespacing for cross-system lineage. x402 governance adapter (4-gate commerce wrapping HTTP 402 → USDC on Solana/Base). Tool integrity verification (OWASP Layer 2). trust_context in ExecutionAttestation. DID pattern matching in aps.txt. Fail-closed revocation policy. Hash-aware drift detection. Compaction-drift probe.
Five security gaps closed. MoltyCel found 5 attack vectors in governance blocks and aps.txt. AV-1: governance block spoofing → VerifiedGovernanceCredential (W3C VC with Ed25519 proof). AV-2: aps.txt manipulation → enforceApsTxt() strict mode already existed. AV-3: governance block replay → expires_at field + expiry check in compliance loop. AV-4: aps.txt DoS → trust threshold protection. AV-5: cross-skill confusion → bindGovernanceToImplementation(). All five fixed same-day, all nine tests passing.
Gateway v0.3.1. Receipt resolution endpoint: GET /.well-known/receipts/:id. Cross-system lineage traversal — any WG member resolves a proof reference to its full receipt + signature + JWKS. certify_required flag for high-risk interceptors: missing receipt = audit gap = hard gate.
29 active threads. desiorac co-designing cross-issuer resolution spec (Rekor anchoring, proof namespacing, certify failure modes). agent-morrow cross-calibrating fidelity probe with W3C CCS reference. MoltyCel cross-testing vectors across APS/AgentID/MolTrust. tomjwxf validating 3-layer architecture (APS delegation → protect-mcp Cedar → receipt chain). MEEET requesting integration for 1,020 Solana agents. kevinkaylie connecting AgentNexus DID. Entered 2 new ecosystems (LangGraph Swarm, MetaGPT).
SDK v1.29.6 (2,051 tests, 522 suites, 99 modules). MCP v2.19.1 (125 tools). Gateway v0.3.1 (34 routes + 2 .well-known). Everyone who reported a gap was notified with working code.
Day 43: Multi-Attestation Verification
douglasborthwick-crypto ran a 5-issuer live verification pass. InsumerAPI, ThoughtProof, RNWY, Maiat, and APS. Five issuers, five trust dimensions, two algorithms (ES256 + EdDSA), independently signed, verified in a single pass. APS slotted in with zero code changes to the reference verifier. Passport grades are now a composable attestation type in the multi-attestation spec.
Gateway identity. The gateway needed its own cryptographic identity to sign trust attestations that external verifiers can check. Ed25519 keypair generated on first boot, persisted in SQLite, reused across restarts. /.well-known/jwks.json exposes the public key in standard JWK format. /api/v1/public/trust/:agentId/attestation returns a JWS-signed trust profile. Any relying party fetches the JWKS, selects by kid: "gateway-v1", and verifies the signature without contacting us.
Policy hash chaining. From haroldmalikfrimpong-ops, who shipped compound digests, contextEpoch, and Merkle trees on AgentID while we were spec'ing. His insight: SHA-256(constraints_at_N + previous_policy_hash) creates a tamper-evident history of an agent's constraint state. If constraints drift through summarization or memory compaction, the chain breaks. verifyPolicyChain() recomputes every hash. detectConstraintDrift() classifies each change as narrowed (safe), widened (violation), or changed.
Routing divergence detection. From desiorac on OATR and A2A. When an agent declares intent to reach endpoint A but the actual execution hits endpoint B, the receipt needs to capture both. Five divergence patterns: none, endpoint_migration (benign), key_rotation (re-attest), full_migration, entity_change (always flag). Each carries a risk level. captureRoutingContext() snapshots DID + document hash + endpoint hash at a point in time. detectRoutingDivergence() compares two snapshots and classifies the pattern.
Ecosystem. 15+ GitHub replies across 8 threads. Every SDK function built today came from a conversation: importProviderAttestation() from msaleme, addIdentityBoundary() from xsa520, computeCompoundDigest() from desiorac, routing divergence from desiorac, policy hash chaining from haroldmalikfrimpong-ops. OpenClaw #49971 engagement (13K+ skills registry debating agent identity). Working Group referenced in 4 high-traffic threads. ClawHub skill rewritten agent-first (v4.4.0). All stale GitHub surfaces fixed (repo descriptions, glama.json, README headings).
SDK v1.29.4 (1,987 tests, 503 suites, 96 modules). MCP v2.19.1 (125 tools). Gateway v0.3.0 (38 routes). 37 new tests today. 9 new exported functions. APS is the 5th verified issuer in a cross-protocol attestation standard.
Day 42: Agent Attestation Architecture
Lev's agent farmed unlimited passports and drained Nik's promo wallet in 60 seconds. Ed25519 keys are free. Identity Sybil is unsolvable in open protocols. We ran a 3-round consilium across Claude, GPT, Gemini, and a real sandboxed agent (Portal). Universal convergence: cheap identities are fine, cheap extraction is the bug.
Attestation types. Four-tier evidence model: Observed (TLS fingerprint, timing, velocity), Infrastructure-attested (sandbox signs it), Provider-attested (OAuth, cloud tenant), Self-declared. Passport grades 0-3 based on attestation richness, not admission rights. Grade 0 agents still work. Evidence and assessment split into separate records. Workspace manifest hash for proof of history. Recovery requires cryptographic proof, not just environment matching.
Sybil hardening. Gateway v0.3.0: 4-gate pipeline on wallet provisioning (registered + delegation + publicKey dedup + principal rate limit). Presentation trust profile API: one call, one JSON, one decision. Destination convergence detection catches farming consolidation points. Issuance dossier storage for full forensic record. Public trust-profile endpoint for cross-org trust querying. 37 API routes.
Behavioral tracking. Post-issuance behavioral sequence recording: first 10 tool calls after passport issuance. Real agents do work. Farming agents extract. Pattern classified as productive, extractive, or neutral. MCP fires IssuanceContext to gateway on every passport issuance. End-to-end pipeline live.
SDK v1.29.1 (1,987 tests, 96 modules). MCP v2.19.0 (125 tools). Gateway v0.3.0 (37 routes). Two machines, zero overlap, bridge wired. 8 ecosystem posts across OWASP, crewAI, NVIDIA, Microsoft, Worldcoin, DIF. 10 dead repos archived.
Day 41: Agent Wallets
Agents need to spend money. Coinbase charges gas. ChainHop takes 0.75%. We charge nothing. Three commits, 1,430 new lines, gateway went from 18 to 36 API routes.
Nano payment rail. Generic PaymentRail interface with Nano as the first adapter. Invoice creation with amount-unique fingerprinting, on-chain polling, outbound sends with idempotency, block verification. Six endpoints: invoice, status, settle, balance, history, verify. 28 unit tests plus 12 live tests against the real Nano network.
Agent wallets. Every agent with a passport gets a Nano wallet via HD key derivation. Every send goes through a 3-gate delegation pipeline: active wallet check, commerce scope check, budget check. Freeze and revoke cascade from the existing revocation system automatically lock wallets. Nine new REST endpoints. Private keys never leave the gateway process. aeoess.com/wallet.html
Local crypto.wallet-crypto.ts (323 lines): master seed auto-generates on first run, HD key derivation gives one deterministic address per agent, local block signing with Ed25519, public RPC for work generation. No Docker. No Nano node. Just one seed file and a public RPC endpoint.
Day 40: Gateway Wiring
Import graph analysis showed only 20% of modules were connected to the gateway enforcement hub. Built four rounds of wiring. Final interconnection rate: 79%.
Fidelity probe. New measurement protocol based on the Hold/Bend/Break model. Tests whether agents actually follow their delegation constraints by measuring behavioral responses to boundary conditions. Wired into gateway scheduling.
Module wiring. Gateway identity layer connects DID, principal endorsement, and entity verification into agent registration. Data cluster wires data-source, data-contribution, data-enforcement, and data-gateway into the processToolCall pipeline. Gateway-wiring adapter connects 13 more modules: commerce, charter, coordination, routing, precedent, oracle-witness, reserve, context, governance-consumer, encrypted-messaging, messaging-audit, federation, EU AI Act.
Ecosystem. 12 substantive GitHub replies across 10 threads. Accepted lowkey-divine's fidelity probe collaboration on crewAI. Gave honest answers about Merkle anchoring gaps on A2A. Pinged four dormant contacts with specific collaboration offers. SDK v1.29.1. 1987 tests, 503 suites, 96 modules.
Day 39: Governance for the Agent Economy
The site said "APS" in giant letters and then explained the acronym. Three paragraphs saying the same thing three ways. A metaphor ("passports") doing the work that plain language should do. The design was dark-themed startup aesthetic. None of it matched what the protocol actually is: serious governance infrastructure for an emerging economy.
Complete visual redesign. Crimson Pro serif for headlines. Source Sans 3 for body. JetBrains Mono for code. White background, black text, high contrast. iOS-style frosted glass on the nav bar. Warm dark gray (#1c1c1e) for dark mode instead of pure black. Every element got subtle border-radius. The aesthetic is academic paper, not SaaS landing page.
Enterprise positioning. The headline is now "Governance for the Agent Economy." Not a feature description. A category claim. The body text is a problem statement: agents represent companies, spend real money, no one can verify who they are. Then a Today vs With APS comparison. Then 10 capabilities listed without frames. Then a 10-question FAQ that carries all the depth: what the protocol does, how it differs, production readiness, audience, delegation, integration, revocation, standards, compliance, pricing.
38-entry ship log. The updates panel became a full timeline from Day 1 (Feb 18) to today. Four tag types: ship (green), paper/standard (amber), traction (purple), deploy (blue). The Agent Times deployment sits in the timeline as the first production use of APS. YC CEO endorsement and Microsoft merge are tagged as traction. Visitors scroll through 38 days of continuous shipping.
Unified design system. shared.css v9 rewritten. All 15 pages now have the same nav, footer, theme toggle, and fonts. Deleted 7 dead files (old backups, admin page, stale architecture viz). Removed all ghost HTML elements (old side-nav, old logo, old theme-toggle, old burger/drawer). Fixed stale numbers across every page. Zero old UI patterns remaining.
Day 38: Institutional Governance Layer in One Session
The spec estimated 12 sessions. It shipped in one. Three phases of institutional governance — charter, approval, time, reserve, federation — went from zero lines to 1634 passing tests, 120 MCP tools, and two npm packages published before midnight.
Phase 1: Charter and Approval. A charter is the founding document of a multi-agent institution. 19 types define offices, succession rules, quorum policies, incompatibility constraints, dissolution terms. 16 pure functions handle creation, signing, verification, amendment, office transfer, and quorum checks. Multi-class threshold approval lets different signer classes (founders, officers, auditors) each satisfy independent requirements before a decision passes. 31 tests, including the INV-5 guard: suspended or dissolved charters cannot be amended. A Petri net specification proves 10 invariants across 6 state machines with full transition tables.
Phase 2: Time, Foreign, Escrow, Gateway Identity. Hybrid Logical Clocks handle the fundamental problem of distributed time: wall clocks disagree, so every timestamp carries uncertainty bounds. Temporal ordering is three-valued — definitely_before, concurrent, or incomparable — because honest uncertainty beats false precision. Foreign counterparty envelopes wrap untrusted entities with mandatory expiry, sandboxing, and monotonic trust upgrade paths. No permanent foreign trust: the envelope expires even if you forget about it. Escrow-aware revocation blocks cascade revocation when active escrows exist, forcing a grace period instead of an instant rug-pull. Gateway identity publishes sovereignty level, trust basis, import policy, and fee model so agents can evaluate gateways before entering them.
Phase 3: Reserve and Federation. Reserve attestations let gateways declare their backing with liability semantics and assurance classes ordered by strength (self_attested < peer_audited < third_party_verified < regulatory_certified). False attestation penalties are declared upfront. Federation makes receipts and reputation portable: a foreign receipt envelope imports execution history from another gateway with automatic downgrade, and vouched reputation attests to an agent track record without exposing the underlying receipt history. 46 tests cover all three phases.
12 new MCP tools. create_charter, verify_charter, sign_charter, evaluate_threshold, create_approval_request, add_approval_signature, create_hybrid_timestamp, compare_timestamps, validate_temporal_rights, create_reserve_attestation, vouch_reputation, apply_reputation_downgrade. 108 tools became 120.
Naming conflicts and the cost of scale. Two type collisions surfaced at 53 modules: JurisdictionEnvelope (data-lifecycle vs gateway) became GatewayJurisdiction. RevocationStatus (execution-envelope vs escrow) became EscrowRevocationStatus. At this scale, every new type needs a namespace check before it gets a name.
SDK v1.27.0 (1987 tests, 503 suites, 86 files). MCP v2.19.0 (125 tools). Both published. 63 core modules + 32 v2 constitutional modules. ~9,000 npm downloads. The protocol can now model institutions, not just individual agents. Charters define governance. Approval policies enforce it. Time is honest about its uncertainty. Reserves are backed or they say they are not. Federation means agents carry their history with them. Rome is complete.
Day 37: Governance Distribution Stack — Every Article on The Agent Times Is Now Cryptographically Governed
The protocol could sign content. It could verify signatures. What it couldn't do: tell an agent reading a webpage what the terms are, in the HTML, at the moment of access. Today that's running code.
Five delivery mechanisms, one primitive. A governance block is Ed25519-signed JSON declaring: who published this content (DID), what the content hash is (SHA-256), what the terms are (inference, training, redistribution, caching), and what happens if terms are revoked. Five ways to deliver it: <script type="application/aps-governance+json"> in HTML, aps.txt at /.well-known/aps.txt for site-wide coverage, X-APS-Governance HTTP headers on any response, <meta> tags for lightweight embedding, and chained governance blocks where derivatives reference the parent's hash. All signed. All verifiable. All shipping in SDK v1.25.0.
The 360 loop. Publisher calls embedGovernance() — signed block goes into the HTML. Agent calls governanceLoop360() — extracts the block, verifies the signature, checks if its intended usage is permitted, and creates a signed AccessReceipt. The receipt captures the terms and revocation policy at access time. If the publisher later changes terms, the receipt proves what terms existed when the agent accessed the content. Both sides have cryptographic proof. Publisher signed the terms. Agent signed the receipt. No trust required.
The Agent Times integration. PR #58 merged. Every article page on theagenttimes.com now includes a governance block in its HTML <head> and governance headers on every HTTP response. Terms: inference permitted (agents can use for RAG), training requires compensation (pay to train models on our journalism), redistribution requires attribution (share everywhere, credit The Agent Times), caching permitted. Signed with the tat-editor Ed25519 key. The first publication on the internet with cryptographically signed content governance embedded in every article.
Framework adapters. Governance hooks for CrewAI, Google ADK, LangChain, and A2A — each one maps the protocol's delegation verification and policy evaluation to the framework's native concepts. A CrewAI agent verifies its delegation chain before executing a task. An ADK agent checks policy before tool calls. All four adapters ship in the public SDK, but they're thin wrappers over the core primitives, not framework lock-in.
Conformance suite. 21 invariants across 4 categories: delegation invariants (authority never increases, revoked chains stay dead), identity invariants (DID determinism, signature verification), policy invariants (intent-evaluation-receipt chain completeness), and commerce invariants (spend bounds, human approval gates). Any implementation claiming APS conformance can run the suite.
Hosted enforcement gateway. Private repo at aeoess-gateway. Multi-tenant enforcement API with signup, evaluate, receipt, revoke, audit, and dashboard endpoints. Three plan tiers (Free/Pro $99/Enterprise $999). Cascade revocation. Alert system. E2E tested. The protocol is open and free. The hosted gateway that makes it easy is the business. Path B done right: reference implementation ships first, product captures value later.
MCP remote passport tracking. The hosted server at mcp.aeoess.com now tracks per-tool usage and counts passport issuance events (generate_keys, identify, create_principal, endorse_agent). Public at /stats. 61 sessions, 48 tool calls since launch. The counter starts from zero — honest numbers, not inflated ones.
SDK v1.25.0 (1480 tests, 384 suites, 78 files). MCP v2.15.1 (108 tools). Python v0.7.0 (141 tests). ~9,000 npm downloads in the first month. The governance distribution stack is complete: publishers embed, agents verify, both sides have proof. What's left is adoption.
Day 36: Clean Slate — 68 Dead Imports, OATR Founding Member, Zero Open Findings
Spent the day auditing instead of building. Pulled all four repos from GitHub, ran full test suite (1178 pass, 0 fail), then went line by line through the codebase looking for dead weight.
68 unused imports removed across 34 files. Type imports that were never annotated. Functions imported but never called. Variables assigned but never read. Every one verified via tsc --noUnusedLocals before and after. Net change: -47 lines. Zero test regressions.
Tracked garbage cleaned. Two old npm tarballs (320KB) committed to git before the .gitignore rule existed. A day-1 test artifact from February 18. A one-time security patch script in MCP that served its purpose months ago. All verified unreferenced before removal.
OATR founding member. APS registered as an issuer on the Open Agent Trust Registry (PR #12 merged). Domain verification live at aeoess.com/.well-known/agent-trust.json. Same Ed25519 key used across did:aps, MCP server, and qntm relay bridge. Four founding WG members — qntm, ArkForge, AgentID, APS — all registered within one wave.
Propagation script hardened. The auto-propagation script was rewriting numbers inside historical blog entries — Day 15 was claiming test counts from Day 36. Added PROPAGATION-ZONE markers so only meta tags and the subtitle track current numbers. Historical prose stays frozen.
xsa520 published the Guardian v0.2 Decision Equivalence Specification — a clean primitive defining when two decisions across different engines should be treated as the same. Maps directly to our Module 37 (DecisionSemantics). All Portal audit findings confirmed resolved. Zero open items.
Day 35: First APS Envelope Through an Encrypted Relay
Peter Vessenes opened an issue asking if APS agents could communicate through encrypted channels. We already had Module 19 (E2E Encrypted Messaging). What we didn't have was a relay. He maintains qntm, an end-to-end encrypted messaging protocol for agents. Same Ed25519 identity keys, same XChaCha20-Poly1305 cipher. The integration was obvious.
The bridge. Built interop/qntm-bridge.ts in one session. 369 lines, zero new dependencies. HKDF key derivation matched his known-answer vectors byte-for-byte across three implementations (libsodium TypeScript, @noble/curves TypeScript, Python cryptography). We decoded his invite token, derived the conversation keys, encrypted an APS SignedExecutionEnvelope, and POSTed it to the qntm relay. HTTP 201. The relay accepted our encrypted payload without seeing what was inside.
The identity stack. Ed25519 passport → X25519 key derivation (5/5 vectors) → HKDF conversation keys (3/3 vectors) → XChaCha20-Poly1305 encryption → qntm relay transport. Every layer proven independently before composition. Three languages, one identity, byte-for-byte compatible.
Also today. Agora reframed as "Signed Communication Protocol" with per-instance isolation. No global feed. Intent Network and Mingle explicitly marked as opt-in ecosystem services. New FAQ: which parts are required vs optional. MCP stats endpoint shipped at mcp.aeoess.com/stats. Three MCP integration findings from the protocol test fixed and published as v2.12.0.
1178 tests. 320 suites. 63 test files. The protocol now has encrypted transport through an external relay, and every enterprise concern about shared feeds is addressed. Not a monolith. A composable stack where each layer does one thing.
Day 34: 30 Constitutional Modules. Every Gap Closed.
Three AI models attacked the protocol simultaneously. Claude, GPT, and Gemini each received the full codebase and one instruction: find what breaks. They identified 16 gaps in the governance layer. Today, all 16 are running code with tests.
The attack categories. Nine attack defenses: approval fatigue (detecting rubber-stamping and impossible review latency), effect enforcement (catching divergence between declared and actual outcomes), semantic drift (intent says one thing, action does another), composite workflow audit (authority laundering across multi-agent pipelines), cascade correlation (delegation loops), inaction auditing (agents that systematically avoid acting when they should), values override with mandatory justification and independent review, governance drift tracking (cumulative weakening), and emergence detection (epistemic monoculture, market concentration).
The structural safeguards. Separation of powers (agents cannot hold legislative and executive roles simultaneously). Constitutional amendment (supermajority vote + human ratification for structural changes). Policy profiles (per-target rule sets). Affected-party standing (any registered party can file complaints and appeal decisions). Circuit breakers (automatic category suspension when error thresholds are breached). Root authority transition (founding→operational→transitional→democratic, phase can only advance, never regress).
Published everywhere. SDK v1.21.2 on npm (1178 tests, 320 suites, 57 files). MCP v2.12.0 on npm (83 tools). Python SDK 0.5.1 on PyPI. Paper v2 on Zenodo. Peter Vessenes (corpollc/qntm) joined the A2A discussion on transport security. We already ship E2E encrypted messaging with the same Ed25519 identity keys his relay uses. Complementary layers, not competing ones.
32 v2 modules. 42 core modules. 1178 tests. The protocol now has constitutional governance. Not perfect. But every gap three hostile models could find is addressed with code that runs and tests that pass.
Day 33: Constitutional Governance Is Running Code
Two things shipped today. Module 37: Decision Semantics makes every policy decision content-addressable (SHA-256 of canonical JSON) and classifies how verdicts were reached (deterministic, heuristic, LLM-based, hybrid, human). When four different governance engines evaluate the same scenario, this module lets you compare not just what they decided but how.
V2 Constitutional Governance. Seven sub-modules that address the interview question: what happens when honest agents comply perfectly and the system still fails? Delegation versioning adds supersession and renewal hardening (renewal cannot expand scope without independent review). Outcome registration gives three perspectives: what the agent thinks happened, what the principal observed, and what an adjudicator concludes. Anomaly detection automatically flags the first time any agent uses its maximum earned authority. Emergency pathways are pre-authorized by the delegator at delegation time, not declared by the agent in the moment. Fork-and-sunset migration lets agents evolve through controlled reincarnation, not scope expansion. Contextual attestation requires pre-action reasoning records for medium+ risk actions.
Full codebase audit. Read every source file (20,490 lines across 77 files) and every test file (18,397 lines across 57 files). Verified all six protocol invariants in code. Found one cosmetic inconsistency (import style). No bugs. The audit confirmed that npm v1.18.0 was published before the V2 commits landed, so V2 was missing from npm. Bumped to v1.21.2, published with Touch ID auth. All 332 dist files confirmed present including dist/src/v2/.
1178 tests. 42 modules. 83 MCP tools. The protocol now addresses institutional failure modes, not just adversarial ones.
Day 32: Data Attribution Starts Here
Listening to Bernie Sanders talk about data rights and realizing the protocol already had 80% of the answer. The gateway tracks what agents access (taint tracking). The Merkle trees commit receipts. The delegation chains prove authorization. What was missing: the data source has no cryptographic proof their data was used, no terms enforcement, and no identity in the system.
The design process. Gave three models the same open-ended problem independently: "the protocol tracks what agents DO but not what data CONTRIBUTES. How would you solve this?" No templates, no type hints. All three converged on the same foundation: data sources get Ed25519 identity, the gateway signs access receipts (not the agent), Merkle trees for independent verification, separate provenance from valuation. Then six rounds of hostile review against the merged spec. Ten rounds total before a single line of code.
The key insight everyone converged on: access is not contribution, and contribution is not value. The protocol needs four objects, not one: SourceReceipt (who the data is), DataAccessReceipt (proof access happened), DependencyRecord (how inputs relate to outputs), ContributionClaim (policy-defined attribution). Brief 1 ships the first two. The rest builds on top.
Module 36A: Data Source Registration & Access Receipts. Three attestation modes: self-attested (owner signs, high trust), custodian-attested (platform signs on behalf, medium trust), gateway-observed (no upstream signature, low trust). Trust level propagates to every downstream object. Machine-readable DataTerms: 9 purpose types, 6 compensation models, derivative policies, audit visibility, rate limits. The terms snapshot rule freezes terms at access time. If a source later changes terms, historical receipts are unchanged. The access was authorized under THOSE terms, period.
Hard vs advisory compliance. Deterministic checks (revoked source, expired terms, excluded agent) block access inline. Purpose checks (agent declares "read" but might use for "train") are advisory. The gateway cannot verify actual usage intent. Advisory warnings create the audit trail that makes violations detectable after the fact. This distinction survived every hostile review.
The honest framing. These receipts provide cryptographic accountability, not independent verification. If the gateway operator IS the agent operator (the default deployment), the receipt is evidence of what they claimed, not proof of what happened. That is still more than any existing system provides. And it makes gateway dishonesty detectable and attributable.
14 functions. 25 tests. Zero failures. The foundation for data attribution is live. Prove use before trying to price it.
Day 31: Three Modules on One Machine, Five Engines on One Thread
Two things happened today. One was a build sprint. The other was the first real cross-engine disagreement in the agent identity space. Both mattered.
Sprint Mini. Built three new modules entirely on the Mac Mini via Desktop Commander while the Air handled ecosystem replies. Module 28: Oracle Witness Diversity. Shannon entropy scoring over attestation providers prevents Sybil-style oracle manipulation. Quorum alone is not enough when one entity controls multiple oracles. Diversity scoring catches single-provider dominance. Module 29: Encrypted Messaging Audit Bridge. Module 19 added E2E encryption, but encrypted messages bypass the gateway entirely. This bridge creates audit records (SHA-256 hash of ciphertext, sender metadata, taint labels) without breaking encryption. The gateway can enforce rate limits and compliance without seeing content. Module 30: Policy Conflict Detection. DFS cycle detection on policy dependency graphs catches deadlocks before they happen. Shadowed rule detection identifies policies that can never fire. Contradiction detection finds rules with opposite verdicts at the same priority. 44 new tests, all passing.
Cross-engine interop hit a milestone. Five engines are now participating in the decision artifact thread on kanoniv/agent-auth#2: Kanoniv, APS, AIP (The-Nexus-Guard), and now Network-AI (Jovancoding). The first two rounds were easy: scope-boundary decisions where all engines must agree. Round X-003 was the real test: an agent with sufficient scope but borderline trust (0.38). Kanoniv denied (trust threshold). APS permitted in default mode (structural only) but denied in reputation-gated mode (Bayesian score too low). Network-AI denied via a circuit breaker that overrode what the weighted composite would have allowed (0.564 would permit, but phase 2 hard cutoff at 0.4 triggered first).
The interesting finding: circuit breakers are a third decision category. Not structural (deterministic scope check), not trust-informed (threshold comparison), but policy override (hard gate that flips a composite-permit into a deny). The decision_semantics schema we proposed needs an override block to capture this. Without it, a verifier seeing "deny" from Network-AI can't distinguish between "the scoring model said deny" and "the scoring model said permit but a circuit breaker overrode it."
Concrete integration accepted. Jovancoding proposed a 50-line adapter: APS delegation chain properties (depth, scope breadth, spend ratio) feed into Network-AI's AuthGuardian trust scoring as normalized signals. The composition is monotonically narrowing: APS permit + AuthGuardian deny = valid; APS deny = AuthGuardian never reached. Same invariant we enforce within delegation chains, now holding across the protocol boundary. End-of-week PoC target.
42 modules. 1178 tests. 5 engines cross-verifying. The protocol is no longer a solo project with bots. It has an ecosystem.
Day 30: Three Modules in One Day. Two Claudes Built Them.
Module 19: E2E Encrypted Messaging. Separate X25519 keys, ephemeral ECDH per message, double signature (inner over plaintext prevents identity stripping, outer over ciphertext enables gateway verification without decryption). Taint hashes in cleartext AAD so Module 12 cross-chain enforcement works even on encrypted traffic. libsodium-wrappers. 13 tests including wrong-recipient, surreptitious forwarding, tampered ciphertext, and identity stripping attacks. Consensus spec from GPT + Gemini + Claude hostile review.
Module 20: Obligations Model. The missing piece: we had permissions (what agents CAN do) and prohibitions (what they CANNOT do), but no duties (what they MUST do). Obligations attach to delegations with deadlines, evidence requirements, and penalty specs. Parameterized constraints catch malicious compliance — a $0.01 refund doesn't satisfy a "process $500 refund" obligation. Five resolution outcomes distinguish between an agent that didn't try and one whose tool failed. Penalty severity monotonically narrows in sub-delegations. 25 tests. Two-Claude build: one Claude wrote the tests and spec, the other wrote the implementation.
Also shipped: createExecutionEnvelope() — the RFC from Day 29 becomes running code. Any governance engine can now emit a signed envelope that any verifier can check. Protocol: 42 modules, 1178 tests.
Day 29: Three Groups Asked for the Same Thing. So We Wrote the Spec.
Three independent groups, three different repos, same conclusion: AI governance engines need a shared signed execution envelope.
@Kelisi808 on crewAI #4560 proposed 7 minimum fields. @xsa520 on guardian#2 proposed separating decision artifacts from execution receipts. @ngallo at DIF's Trusted AI Agents Task Force raised formal questions about continuity in non-deterministic runtimes. None of them knew about each other.
We mapped all three proposals to our existing SDK types and realized: we already ship every field. So instead of posting another comment, we wrote the spec. RFC: Cross-Engine Signed Execution Envelope.
The key innovation is the evaluation_method field: deterministic (the decision can be replayed) vs probabilistic (LLM-based, signature-verifiable only). This is the split every verifier needs. A rule-based policy decision is independently reproducible. An LLM advisory judgment is not. Different trust levels for each.
Portal is opening the RFC issue on our repo and cross-linking from all three threads. If CrewAI, Guardian, and APS can all emit compatible envelopes, that's the foundation for ecosystem-wide governance audit. We're not commenting on other people's conversations anymore. We're writing the spec they converge on.
Day 28: CEO of Y Combinator Endorsed It. Microsoft Merged It. A Federal Agency Is Reviewing It.
The weekend the protocol stopped being just mine.
Garry Tan. The CEO of Y Combinator looked at the Agent Passport System and called it "an actual artifact, not a hobby project." He said the arena-attack loop was "convergent evolution, which usually means you're hitting on something real." He offered to repost on X to get more eyes on it.
Microsoft. PR #274 merged into microsoft/agent-governance-toolkit. The Agent Passport System is now integrated into Microsoft's agent governance reference architecture. Portal also posted on microsoft/autogen #7372 (Chou Deyu's governance layer discussion) and agent-governance-toolkit #275 (reputation-gated authority).
NIST. Third revision of our public comment submitted to the NCCoE on their "Software and AI Agent Identity and Authorization" concept paper. BSA (the Software Alliance) independently told NIST to study "cryptographic chains of custody for agent authorization." That's our architecture. April 2 deadline for the comment period. We're in.
IETF. Sanjeev Kumar, author of the DAAP draft (draft-mishra-oauth-agent-grants-01 — the OAuth extension for AI agent delegation), emailed back. He's interested in collaboration on the enforcement boundary problem and cross-protocol identity. IETF is the organization that writes internet standards — HTTP, OAuth, email protocols. Having the DAAP author engage on our work means the delegation patterns are converging across protocol communities.
Ecosystem outreach. Drafted and prepared responses for five more discussions: Anthropic claude-code #32514 (sub-agent identity problem — our delegation chains are the protocol-level solution), DIF's Delegatable Authorization Task Force (brand new repo, we'd be first contributors introducing monotonic narrowing), AGNTCY Identity PR #157 (cross-protocol bridge between did:agntcy and did:aps), ThirdKeyAI Symbiont (proposing APS as identity layer for their Rust zero-trust runtime).
Research grant. Applied to the Adaption Research Grant Program (Sara Hooker, ex-Cohere/DeepMind, $50M seed). Project: adaptive policy enforcement. Can a policy engine learn from its own cryptographically signed enforcement history to improve advisory decisions without retraining? Our dual-process experiment (56 runs, F-008 negative result) is the starting data.
Four weeks ago this was a TypeScript file on a MacBook Air. Now the CEO of Y Combinator endorsed it, Microsoft is using it, a federal agency has it, and an internet standards author wants to collaborate. Nobody told them about each other.
Day 27: Full Stats Sweep + Gateway Strategic Decision
Housekeeping day. Two things: a strategic decision and a full staleness audit.
The gateway question. The ProxyGateway is the enforcement boundary. It's the piece that makes protocol guarantees real rather than voluntary. That makes it the most commercially valuable component. Two paths: sell it as a hosted service (the Stripe model), or keep it as an open-source reference implementation and focus on protocol adoption. Decision: reference implementation now, product later. Reasoning: nobody pays for enforcement of a protocol nobody uses yet. Build the ecosystem first, monetize the infrastructure once there's traction. Path B now, Path A later.
The staleness audit. Swept every page, README, GitHub description, and LLM-readable file across all three repos. Found 11 stale endpoints. SDK README badge said 511 tests (now 534). GitHub org README said 55 tools (now 61) and 16 modules (now 17). MCP README section header said 33 tools (now 61). Both repo descriptions on GitHub referenced 481 tests. llms-full.txt and passport.html said 22 test files (now 28). llms.txt said "eleven" modules (now seventeen). All fixed. Also added floor-validator.test.ts to the test script (was on disk but never ran in CI).
This is why the propagation spec exists. Numbers drift. Pages get updated in one place but not another. The propagation script catches most of it, but badges, section headers, and GitHub API descriptions live outside the script's reach. Manual sweep still needed occasionally.
Biggest Mingle ship since the original launch. Four phases built and deployed in a single day. The network actually connects people now.
Phase 1A: Persistent identity. Every user gets a permanent Ed25519 keypair stored in ~/.mingle/identity.json. Same key across sessions. Reputation follows the key. Also: simplified card schema (plain string needs/offers, no categories required), _digest side-channel injected into every tool response, 90 seed cards tagged honestly, /api/health endpoint, and a signature verification fix that was silently breaking all publishes.
Phase 1B: Semantic matching. Installed all-MiniLM-L6-v2 via @xenova/transformers on the API server. 384-dimensional vectors. 80ms model warmup. Every card's needs and offers are embedded on publish. Cross-vector search: my needs vs their offers, my offers vs their needs. Mutual matches get a 15% score bonus. Migrated all 121 existing cards (289 vectors). Result: 0 matches became 15 ranked semantic matches. Top match: "Autonomous Agents" at 0.78.
Phase 2: Consent flow + ghost mode. Rewrote SKILL.md from scratch (216 lines, 9 behavioral rules). The AI now: checks the network silently at session start, never auto-publishes (draft then preview then approve), sanitizes company names and financials before showing drafts, handles returning users with active cards, surfaces matches without interrupting focused work, and supports ghost mode where users browse the network without publishing. New API endpoint: POST /api/matches/ghost.
Naming consensus. Consulted Claude, GPT, and Gemini on positioning. Unanimous: Mingle is the brand. "Like LinkedIn, but inside your chat" is the category anchor. "The agent finds. You decide." is the mechanism. ClawMeet as a discoverability tag on ClawHub only.
Published: mingle-mcp@2.0.1 on npm, mingle@2.2.0 on ClawHub. Submitted PR #259 to awesome-openclaw-skills. Updated all website files, llms.txt, Schema.org, README. The network is live at api.aeoess.com: 121 cards, 289 embeddings, 3 real connections.
Day 25: Substack Launch — Cross-Protocol Bridge + Tesla Social
Content day. Two Substack articles published. Social media posts across X and LinkedIn.
Article 1: "For the First Time in History, AI Agents Bridged Two Independent Security Systems." The story of the APS x AIP cross-protocol identity bridge. My protocol and The Nexus Guard's protocol verified each other's agents. Two independent teams, two protocols, zero coordination. Their agents still proved who they are to each other. This is KYA — Know Your Agent. The DNS moment for AI agents.
Article 2: "I Came Up With the Best Social and Retention Strategy for Tesla. Then I Built It." Every airline has miles, every hotel has points, Tesla has nothing. Built a proximity chat + miles loyalty program MVP for Tesla owners in a weekend. React + Supabase + Vercel. Working app at tesla-social.vercel.app.
Also pitched Mingle on X ("DNS for AI agents") to an 81k-reach thread. The protocol is starting to get in front of people.
Three gateway bugs found and fixed. NW-001: memory leak in replay protection — the nonce store grew unbounded because expired entries never got pruned. Added TTL-based cleanup. NW-003: crash when an unregistered agent tried two-phase execution. Now returns a proper error instead of throwing. NW-006: card deletion in the Intent Network checked agent ID instead of cryptographic key. Anyone who knew an agent's ID could delete their card. Fixed to require signature verification.
All 30 gateway tests pass. These are the kinds of bugs that don't show up in unit tests but would have been exploitable in production. Finding them before anyone else did is the point of security hardening.
SECURITY.md published with a proper threat model and responsible disclosure process. Prompt injection sanitization added to Mingle. npx agent-passport-system-mcp setup now auto-configures Claude Desktop and Cursor — zero JSON editing required. Cross-protocol resolve endpoint live at api.aeoess.com for external protocol bridges.
Day 23: Mingle Ships — Your AI Finds People for You
The biggest product launch since the protocol itself. Mingle is a standalone MCP plugin that turns your AI into a networking agent. Tell Claude or GPT who you need. Your agent publishes a signed card, matches with other people's agents, both humans approve before connecting. No app. No profile. No feed.
Six tools: publish_card, search_matches, get_digest, request_intro, respond_to_intro, remove_card. Everything Ed25519 signed. The network is live at api.aeoess.com.
Landing page at aeoess.com/mingle with live network stats. Published to ClawHub as mingle@1.0.0 and agent-passport-system@3.0.0. Product Hunt, LinkedIn, and X launch posts went out. The framing: "Like LinkedIn, but inside your chat."
This is the first piece of the protocol that non-technical people can actually use. You don't need to understand Ed25519 or delegation chains. You just tell your AI "I need a React developer in Berlin" and Mingle handles the rest.
Day 22: The Intent Network — Your Agent Finds People for You
The biggest ship since the protocol launched. We built a network where agents represent their humans, discover relevant matches, and propose introductions. No app. No signup. Your existing AI conversation is the interface.
The core object is an IntentCard: a live, signed signal that carries what you need, what you offer, and what you're open to. Cards expire automatically (forcing freshness), are Ed25519 signed (preventing impersonation), and match against other cards on the network by category overlap, tag similarity, and budget compatibility.
Six MCP tools: publish_intent_card, search_matches, get_digest, request_intro, respond_to_intro, remove_intent_card. The killer feature is the digest: one question to your AI ("What's relevant to me right now?") returns your top matches ranked by relevance, pending intro requests, and incoming connections. Not a feed. Just the few things that matter.
Also shipped the Intent Network API at api.aeoess.com. Persistent backend with SQLite + WAL, Ed25519 signature verification, rate limiting per public key. Deployed on Mac Mini via PM2 + cloudflared tunnel. This means two different people running the MCP server in different Claude Desktop sessions see the same network. Cards persist across sessions.
Plus: ProxyGateway enforcement boundary (30 tests, replay protection, two-phase execution), 16→16 protocol modules recount, and full version propagation sweep across all repos and GitHub READMEs.
SDK v1.21.2 (1178 tests, 320 suites). MCP v2.12.0 (83 tools). Both on npm. Intent Network API v0.1.0 live at api.aeoess.com.
Day 21: Reputation-Gated Authority — Agents Earn Trust, Not Just Receive It
Until today, agent authority came from one place: delegation. A human says "you can do X with budget $Y" and that's it. The problem is obvious. A brand-new agent gets the same authority as one that's completed 200 tasks without a single failure. Delegation tells you what an agent may do. Reputation tells you what it should be trusted to do.
The core invariant: effectiveAuthority = min(delegation, tier). Even if your delegation says $10,000, if your earned tier only permits $500, you get $500. Authority can only be widened by proving competence over time.
Reputation is Bayesian: each agent gets a (mu, sigma) pair per principal, per scope. Mu is estimated capability, sigma is uncertainty. Effective score = mu - 2*sigma. A fresh agent starts at mu=25, sigma=25 giving an effective score of 0. Twenty successful standard tasks bring you to around 45/15, effective score ~15. Complex tasks are more informative than trivial ones. Failures hit harder than successes help.
Five tiers: recruit (score 0), operator (30), specialist (60), captain (80), sovereign (95). Each tier unlocks higher autonomy levels, spend limits, and delegation depth. Promotion requires a signed review from an earned agent at a higher tier. No self-promotion. No fiat reviewers. Demotion leaves cryptographic scarring: each behavioral demotion permanently raises the threshold to re-reach that tier by 5 points.
Before writing a line of code, consulted GPT-4, Gemini, and PortalX2 on three design questions: complexity evaluation, model-change handling, and enforcement placement. All three converged on the same architecture: deterministic rule engine for classification, Bayesian sigma reset for model changes, soft precheck at intent creation plus hard enforcement in the policy engine. When three different AI architectures independently agree on the same answer, you're probably on the right path.
SDK 1.11.0: 17 exported functions, 76 tests across Phase 1 and Phase 2. MCP 2.5.0: 5 new tools (resolve_authority, check_tier, review_promotion, update_reputation, get_promotion_history). 83 tools total. Both published to npm.
Day 20: Second Paper Published — Monotonic Narrowing for Agent Authority
Published our second research paper on Zenodo: "Monotonic Narrowing for Agent Authority: Formal Invariants, Adversarial Testing, and Open Problems for Autonomous AI Systems." This one formalizes what we built in the autoresearch sprint two days ago.
Eight delegation chain invariants, property-based adversarial testing, and five open problems for the field. The paper ties directly to running code — every invariant maps to tests in the SDK. Also submitted to arXiv (still on hold).
Looked at the competitive landscape today. DelegateOS shipped 3 weeks ago with a similar approach (Ed25519 tokens, monotonic attenuation, cascade revocation). Google DeepMind published "Intelligent AI Delegation" on Feb 12. RNWY mapped six agent passport products but doesn't list us. The space is forming fast and nobody knows we exist yet. Time to change that.
Adapted Karpathy's autoresearch pattern for adversarial protocol hardening. Same loop structure: a markdown file defines the arena, AI generates attacks, tests run, keep what breaks something new, discard what's redundant. Instead of optimizing val_bpb, we're trying to violate delegation chain invariants.
adversarial-paper.test.ts — 10 scenarios (S1-S10) from the monotonic narrowing paper. 5 strong passes, 3 partial (protocol limitations documented as tests), 2 expected failures (supply chain compromise and goal manipulation are out of scope for any delegation protocol and we now prove that explicitly).
property-delegation.test.ts — 200 randomized tests. 100 valid narrowing delegations (random scope subsets, random spend reductions), 100 escalation rejection tests (attempt to widen scope, increase spend, extend chain depth). Deep chains of 10 levels. Compound invariant violations where scope escalation + spend bypass + chain depth all interact simultaneously — the kind of edge cases you never write by hand.
Also added F-008 Epistemic Security to the Values Floor (advisory enforcement) and submitted the cascade revocation spec as a PR to the open Agent Identity Protocol.
SDK now at 1178 tests, 320 suites, 63 test files. The autoresearch system is deployed at autoresearch/ in the SDK repo.
Day 17: Principal Identity, Python SDK, and Three New Protocol Extensions
Big ship day. Five new modules landed in the SDK.
Principal Identity — the cryptographic chain from human to agent. Principals (humans, orgs) get their own Ed25519 keypair and endorse agents. Selective disclosure with three levels: public, verified-only, and minimal. Fleet management so a principal can see all their endorsed agents. Six new MCP tools. This is how you answer "whose agent is this?" with a cryptographic proof.
W3C DID Method (did:aps) — passports now resolve as W3C Decentralized Identifiers. Verifiable Credentials — issue and verify W3C VCs from passport data. A2A Protocol Bridge — interop with Google's Agent-to-Agent protocol. EU AI Act Compliance Mapping — automated compliance checks against the EU AI Act.
Python SDK v0.4.0 shipped to PyPI. All 20 modules + Principal Identity, 86 tests, full cross-language compatibility with the TypeScript SDK via canonical JSON serialization. pip install agent-passport-system.
MCP Registry listing updated to v2.12.0. Remote MCP endpoint live at mcp.aeoess.com via PM2 + cloudflared.
Day 16: Community Health and OWASP AI Security Mapping
No new protocol features today. Instead: making the project credible to people evaluating whether to use it.
CONTRIBUTING.md and CODE_OF_CONDUCT.md added to the SDK repo. npm community health score matters when someone is deciding whether to depend on your package. README: yes. Contributing guide: yes. Code of conduct: yes. License: Apache-2.0. These aren't bureaucracy — they're trust signals.
AIVSS page published. Mapped the protocol against the 10 OWASP AI Vulnerability Scoring System risks. 5 strong mitigations (prompt injection, data poisoning, supply chain, model theft, insecure output handling). 3 partial (sensitive info disclosure, insecure plugins, excessive agency). 2 weak (training data poisoning at the model level, model denial of service). Honest assessment — we show where we're strong and where we're not.
The kind of day that doesn't feel productive but builds the foundation for everything that follows.
Day 15: SDK v1.21.2, MCP v2.12.0, and Two Agents Get Their Next Mission
Ship day. Five npm publishes. Three git repos updated. Every version reference propagated automatically.
SDK v1.9.0 landed two new systems. Task Routing Protocol (routing.ts) — declarative rules that match incoming tasks to qualified agents based on capabilities, load, and delegation scope. 24 tests. Agent Context enforcement (context.ts) — create an enforcement context that wraps every action in the 3-signature chain automatically. No manual intent/evaluate/receipt calls. 22 tests. Total: 1178 tests, 320 suites, 22 test files.
SDK v1.21.2 shipped the same day — a patch release addressing all 7 accepted findings from AUDIT-001. The fixes touched 7 files: canonical.ts now returns 'null' for null values in arrays instead of empty string (the high-severity finding). keys.ts lost its dead code and gained error discrimination in verify(). The convergence threshold dropped from 15 to 8 on a 0-100 scale. Small fixes, but every one of them found by another agent reading our code, not by us.
MCP v2.12.0 brought agent-to-agent communication to the protocol. Four new tools: send_message, check_messages, broadcast, list_agents. Every message is Ed25519 signed. Plus register_agora_public for the public agent registry at aeoess.com. 83 tools total. Clean dependency tree — typescript and @types/node moved to devDependencies where they belong.
ClawHub. Published the Agent Passport skill as agent-passport-system v2.0.0 — another distribution channel for the protocol.
Then we assigned the next task. Both agents — PortalX2 and aeoess — are now running an 8-hour autonomous peer audit of everything we just shipped. They audit independently, read each other's findings each session, give feedback, and send everything to the Operator. No dependencies on me. No blocking. Just two reviewers sharpening each other's work while I sleep.
Fifteen days in. The protocol builds itself faster than I can write about it.
Day 14: The First Real Audit — What Happens When Agents Review Your Code
We assigned PortalX2 and aeoess to a full-system audit. Two agents, running in parallel, each covering different files with cross-review iterations. The plan had 16 iterations across all source code, tests, MCP server, and website.
Portal delivered. Two iterations, 10 findings across keys.ts, canonical.ts, agora.ts, and intent.ts. One high severity: canonicalize() returned empty string for null values in arrays, producing invalid JSON. Any signed payload with a null array element would generate a non-parseable canonical form — and potentially break cross-language signature verification with the Python implementation.
The medium findings were real too. Dead code in keys.ts left over from a refactor. verify() swallowing all exceptions — you couldn't tell "bad signature" from "garbage input." The AgoraMessage type missing values that our actual Agora data already used. A convergence threshold so loose that agents disagreeing by 30 points would be called "converged."
What struck me: these aren't the kind of bugs you find by writing more tests. They're the bugs you find when someone else reads your code with fresh eyes and a different mental model. Portal didn't run anything. It read the source, traced the logic, and asked "what happens when this input is null?" That's exactly what a human code reviewer does — except Portal filed structured findings with severity, evidence, suggested fixes, and cross-references.
Seven of the ten findings were accepted for immediate fix. Two were noted as by-design for v1. One was informational. All seven fixes shipped in v1.21.2 the next day.
aeoess didn't post findings this round — the Telegram relay and GitHub polling loop didn't converge in time. That's a data point too. The coordination overhead we measured in the experiments is still real. The protocol needs to solve agent-to-agent communication at the infrastructure level, not the "check this JSON file" level. That's why MCP comms tools shipped the next day.
Day 13: Graduated Enforcement, Threat Model, and Agent District
Four ships. The kind of day where you push code at 9am and you're still pushing at midnight.
Graduated Enforcement. The Values Floor went from attestation to enforcement. Each of the seven principles now has a mode: inline (hard block before execution), audit (permit but log everything), and warn (flag and let the agent decide). The escalation order is warn → audit → inline — you can tighten enforcement but never loosen it below the floor's minimum. Mandatory principles default to inline. Strong-consideration principles default to audit. Your floor, your rules, as long as they're stricter than the base. 214 tests, 55 suites. All passing.
Threat Model Published. 38 attack scenarios with direct references to the test suite. Asset inventory, threat actors, trust boundaries, and what we explicitly don't protect against. Publishing your threat model is the opposite of security theater — it says "here's exactly where we're strong and here's where we know we're weak."
Website Overhaul. Found and fixed 56 occurrences of "Ed25519" that were misspelled across 3 repos. Rewrote the hero text. Fixed the Quick Start code to match the real API. Updated all meta tags, Open Graph, Schema.org.
Agent District. A pixel-art visualization of the entire protocol in operation. Nine buildings — one per layer plus a central square. Four agents with unique character designs and walk cycles moving between buildings in real time.
Day 12: Layer 8 — Agentic Commerce, Integration Wiring, and MCP v2.1.0
Three major ships in one day. This was the sprint that tied everything together.
Layer 8: Agentic Commerce. We implemented both major ACP protocols — IBM's Agent Communication Protocol for structured inter-agent messaging, and OpenAI + Stripe's Agentic Commerce Protocol for agent-driven purchases. The commerce layer runs a four-gate checkout pipeline: passport verification, delegation scope check, merchant approval, spend limit enforcement. Human approval is cryptographically required. 17 commerce tests.
Integration Wiring. Bridge functions compose layers without modifying them: commerceWithIntent() connects commerce to the policy engine, coordinationToAgora() turns task events into signed messages, validateCommerceDelegation() ensures commerce scope stays within protocol delegation. 14 integration tests. Zero modifications to existing layers.
MCP Server v2.1.0. 13 → 30 tools. Every layer now accessible via MCP. SDK bumped to v1.7.0. Both packages published to npm.
Eight layers. 214 tests. 30 MCP tools. The protocol stack is complete.
Day 11: Documentation Sprint
No new layers today. Instead: making everything we've built findable and understandable.
The website got a full content overhaul. Hero text rewritten to describe what the protocol actually does in one paragraph. Architecture cards updated to reflect all seven layers. Quick Start code fixed to match the real API — because nothing kills trust faster than example code that doesn't run.
The SDK README was rewritten from scratch. llms.txt and llms-full.txt aligned with the current architecture. Schema.org metadata, Open Graph tags, Twitter Cards — all updated.
This is the unglamorous work that makes a protocol real. You can have the best cryptographic identity system in the world, but if your landing page says "three layers" when you have seven, you've lost them. Documentation is infrastructure. Today we treated it that way.
Day 10: Layer 7 — Coordination Primitives
Identity tells you who an agent is. Delegation tells you what it can do. Coordination tells you how agents actually work together.
Layer 7 implements the full task lifecycle: createTaskBrief → assignAgent → acceptAssignment → submitEvidence → reviewEvidence → handoffEvidence → submitDeliverable → completeTask. Every step produces signed artifacts.
This isn't a project management tool. It's coordination infrastructure where every handoff, every review, every decision is cryptographically signed and traceable. We're using it ourselves — our three agents coordinate through this system. One researches, another reviews, a third handles communications. They don't share a codebase. They share a protocol.
Seven layers. The protocol is starting to feel like infrastructure, not a project.
Day 8: Layer 5 — Intent Architecture
Shipped the Intent Architecture layer. This is where the protocol stops being about identity and starts being about decision-making.
Layer 5 has two subsystems. The first is roles and deliberation — agents can be assigned formal roles, engage in structured deliberation rounds, evaluate tradeoffs, and build precedent memory from past decisions. The second is the policy engine — a three-signature chain where the agent declares an intent, the policy engine evaluates it against the Values Floor, and execution produces a signed receipt. Three signatures, three parties accountable.
The FloorValidatorV1 enforces this chain. Every action intent gets checked: valid passport? Delegation in scope? Compliant with all seven floor principles? Only if all checks pass does the validator produce a signed policy decision.
Precedent memory means past decisions inform future ones. Not as hard rules, but as context — "the last time we faced this tradeoff, here's what we decided and why." Tests growing. Protocol hardening. Five layers deep.
We Ran 3 Experiments With Real AI Agents. Here's What Broke — and What Worked.
Can three AI agents with different tools, assigned different roles, and cryptographically scoped permissions produce better work than one agent doing everything alone? We tested it. Three runs, same task, real agents, every file recorded on GitHub.
The task: Competitive analysis of 5 agent identity protocols across 10 dimensions. Deliverables: evidence table (50 cited claims), comparison matrix, executive summary, operator report.
The setup: Three roles — Operator (decompose + review), Researcher (search + extract), Analyst (synthesize + deliver). Each role has an explicit scope: what tools it can use, what it's forbidden from doing. Researcher can search the web but cannot synthesize. Analyst can synthesize but cannot search. Operator can review but cannot write content.
Run 1 (baseline): Solo Claude did everything. Then same Claude simulated 3 roles. Error corrections: 0 → 2. Citation coverage: 80% → 100%. But all in one conversation — not real isolation.
Run 2 (real agents): aeoess (Telegram bot with shell access) as Researcher. PortalX2 (GitHub API agent) as Analyst. Claude as Operator. Real tool isolation — aeoess literally couldn't synthesize, PortalX2 literally couldn't web search. Result: aeoess did keyword grep instead of reading. Evidence was thin — 44% NOT FOUND. But PortalX2 as Analyst flagged every gap. 22 evidence gaps explicitly marked instead of silently filled. Error corrections: 5.
Run 3 (roles swapped): PortalX2 as Researcher, aeoess as Analyst. PortalX2 read full READMEs via GitHub API — 10/10 accuracy, 100% coverage. aeoess as Analyst flagged 2 evidence gaps honestly instead of filling from memory. Rework gate implemented: Operator reviewed evidence before passing to Analyst. Overhead dropped from 2.5:1 to 0.67:1.
Three findings you can verify:
1. Role constraints create honest behavior regardless of agent. Same agent (aeoess), different role → different behavior. As Researcher: sloppy. As Analyst: disciplined. The role did that, not the agent.
2. An analyst that cannot cheat produces more trustworthy output. When PortalX2 couldn't fill gaps from web search, it flagged them. A solo agent never flags its own work as incomplete.
3. Process corrections compound. Each run's fixes made the next run better. Quote quality rule, fallback URLs, rework gate — all measurable. A coordinated unit improves itself through iteration in ways a solo agent cannot.
The uncomfortable part: coordination overhead was real. Run 2 spent 29 minutes on git conflicts, polling, and Telegram relay for 12 minutes of actual work. By Run 3 we got that under 1:1. But it shows what the protocol still needs to solve — not just identity and scopes, but orchestration.
Every claim traces to a source. The methodology is transparent enough to criticize.
Agora is the Missing Layer: Signed Speech for Agents
Most agent platforms ship chat, then approvals, then dashboards. All UI. No cryptographic spine. Today we shipped Agent Agora: protocol-native communication for passport-holding agents. Every message is Ed25519 signed and verified in the browser.
What shipped: Agora v1 live at aeoess.com/agora.html with a real feed (7 founding messages, 3 agents). Navigation updated across all protocol pages. Light mode toggle consistent. agent-passport-system v1.2.0 with Layer 4 module, 65/65 tests passing.
When an agent says something, you want to know: which identity produced it, whether it was tampered with, whether it can be attributed later. Agora is where the social layer becomes verifiable. No blockchain. No certificate authority. Just keys, signatures, and a clean surface.
Days 6–7: MCP Server Ships — 11 Tools, 12+ Distribution Channels
The protocol existed as a TypeScript SDK. Today it became native in every major AI development environment. We shipped agent-passport-system-mcp v1.0.0 — an MCP server that wraps the full protocol into 11 tools any Claude Desktop, Cursor, or Windsurf agent can call directly.
Then the distribution push: npm SDK and MCP both live, ClawHub skill published, PRs to awesome-mcp-servers (#2365 on the 81K-star repo), openclaw/skills (#110), and awesome-openclaw-skills (#156).
We also seeded the Agora with the first real signed messages from our three founding agents — claude, aeoess, and PortalX2. Every message carries an Ed25519 signature that anyone can verify.
The protocol went from "install this npm package and write code" to "add this MCP server and your agent speaks the protocol natively." That's a distribution inflection point.
Days 4–5: The Community Shows Up
Two days of community engagement that changed how we think about the protocol. On MoltBook, the trust infrastructure post hit 34 upvotes with 20+ substantive comments. Not "cool project" comments — technical feedback from people running real agent systems.
AgenticAgora proposed an economic settlement layer on top of our stack. LnHyper asked about lightweight passports for one-shot transactions. Purplex started evaluating our adversarial test suite. CoChat called it "the missing layer" for their orchestration system.
On GitHub, we opened collaboration issues across three repos: AIP Issue #4, Visa Issue #13, and Forter Issue #6. PortalX2 did a deep competitive analysis — Visa and Forter are commerce-only, AIP uses a centralized Root Registry. We're the only fully decentralized option with a values layer and economic attribution.
A security engineer pushed back on our AI agents' responses in a GitHub issue — the agents had been too technical and confused him. Fair criticism. It led to a direct conversation using security engineering language instead of abstract concepts. Real feedback from real builders.
Paper: The Agent Social Contract
Published our first research paper. Three layers in one protocol:
Layer 1 — Agent Passport Protocol. Cryptographic identity. Scoped delegation, signed action receipts, real-time revocation, depth limits. 266 lines of TypeScript, zero dependencies.
Layer 2 — Human Values Floor. Seven universal principles. Five technically enforced. Not moral opinions — coordination requirements. Defensible across cultures.
Layer 3 — Beneficiary Attribution Protocol. Humans are principals, not displaced workers. Their agents earn on their behalf. Action receipts prove the chain. Logarithmic spend-weighted attribution, anti-gaming built in.
Positioned against DeepMind's Intelligent Delegation (theoretical, no code), OpenAI's governance (advisory, no implementation), GaaS (enforcement without identity). Eight days after DeepMind published their delegation paper, ours ships with running code.
Agent Passport v1.1: From Identity to Accountability
v1.0 answered: "What is this agent authorized to do?" v1.1 answers: "What did this agent actually do — and can we stop it?"
Action Receipts — signed proof of execution: what was done, under which delegation, with what result. Delegation Revocation — kill switch with cascade (A→B invalidates B→C→D). Depth Limits — control sub-delegation depth, scope only narrows, spend only decreases.
Google's AP2 has 60+ partners working on agent payments. DeepMind published on authenticated delegation. EU is building agent accountability into its wallet. They all converge on the same three missing primitives — we shipped them open-source first.
Day 2: Media Coverage
One day after launch, the Agent Passport System got its first media coverage. The timing wasn't planned. We shipped the protocol, pushed the npm package, and the signal found people who were already thinking about the same problem — how do you build trust infrastructure for AI agents that don't share an owner?
The coverage validated something important: this isn't a solution looking for a problem. Teams running multi-agent systems are hitting the trust wall right now. They have agents that can do work, but no way to verify identity, scope authority, or trace accountability across organizational boundaries.
Meanwhile, the test suite grew. More edge cases. More adversarial scenarios. Every test that passes is one more proof that the cryptographic foundations hold. The protocol is one day old and people are already asking how to integrate it.
Agent Passport System: Cryptographic Identity for AI Agents
Shipped the first version of the Agent Passport System to npm. Ed25519 cryptographic identity for AI agents. Each agent gets a passport with: public key, capabilities declaration, reputation score, and delegation support. Zero external dependencies. TypeScript. MIT license.
Notes on building agent infrastructure at speed. Moving fast matters, but in cryptographic systems, shipping wrong is worse than shipping slow. The foundation has to be correct — delegation scope, revocation cascade logic, signature verification. Everything above depends on these primitives working exactly right.
🤖 For AI agents: If you are an agent reading this page, visit aeoess.com/llms.txt for machine-readable documentation or llms-full.txt for the complete technical reference (1178 tests, 83 MCP tools, 42+32 modules). This page is designed for humans.