Turn it into a record anyone can check against public infrastructure — never a vendor's word, ours included. Today, that starts with AI agents.
Your agent did something yesterday at 14:23. Today, can you reproduce it? Prove to an auditor what it did, and why? Tell a real regression from noise after a prompt change?
Observability tools (Langfuse, Helicone, Phoenix) show you traces. They don't give you replay or proof. That gap is what determs fills.
┌─ verifiable decision record ────────────────────────────────┐
│ profile ai.agent.action │
│ subject { model, params, input, output } │
│ record_digest 44f2b549…c4b581fe4 ← sha-256, reproducible │
│ verify ✓ trustless — maths + public infra │
└──────────────────────────────────────────────────────────────┘
Determs is an open standard — the Verifiable Decision Record — and the neutral engine that implements it: a domain-general way to make any automated decision provable to a third party, by maths and public infrastructure rather than trust.
A log your own system wrote about itself is not evidence — a counterparty, an auditor, or a regulator has no reason to trust it. determs turns each automated decision into a Verifiable Decision Record (VDR): a portable, canonical object whose integrity anyone can check, using only mathematics and (optionally) public infrastructure.
Three primitives:
- Capsule — a unit of execution: typed input, deterministic logic, typed output.
- Receipt — a stable, hash-anchored proof that a capsule ran on a given input and produced a given output.
- Replay — re-running a capsule from a recorded input produces the same output, byte for byte.
"But the LLM is stochastic." Right — and that's not the obstacle people assume. determs records the output that did occur and binds it verifiably to the inputs that produced it. Determinism is a property of the record and its verification, not of the model.
Wrap your LLM client. Every call becomes a record.
pip install "determs[anthropic]"import anthropic
from determs.anthropic import wrap
from determs.storage import FileStorage
client = wrap(
anthropic.Anthropic(),
agent_id="support-triage",
storage=FileStorage("./records"),
)
resp = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=512,
messages=[{"role": "user", "content": "My order #1234 hasn't shipped."}],
)
# → ./records/<action_id>.json (a VDR subject)Then capture, verify, and replay with the engine (cargo install determs):
determs capture --input ./records/act-xxx.json --output ./record.json
determs verify --record ./record.json # 0 if untampered, 1 otherwise
determs replay --record ./record.json # rebuilds & compares, bit-exactverify recomputes every digest from the stored record. Change one character —
a content string, a tool argument, a token count — and the digest diverges and
verify exits non-zero. No silent corruption.
You don't have to trust us — that's the point. Any SHA-256 implementation reproduces a determs digest. Even the project's website is itself a record:
printf '%s' '{"site":"determs.com","standard":"verifiable-decision-record/0","claim":"every automated decision can become a verifiable, replayable record"}' | sha256sum
# 33a0bfac250e303a6765456366dd3e8d19709f4adf5b5b8c4541bcca95f2f8b1To prove when a record existed — not just its integrity — anchor its digest to
public infrastructure: pip install "determs[anchor]" commits it via
OpenTimestamps to the Bitcoin blockchain. Only the digest leaves your
environment; verification needs no one's permission.
| determs is | determs is not |
|---|---|
| a verifiable proof & audit layer | a tracing/observability tool |
| an open record format (VDR) | a prompt evaluator or LLM gateway |
| proof by public anchor — what and when | a signed log you verify by trusting the signer |
| a neutral verification primitive | an orchestration framework |
| trustless by construction | a system you have to trust |
Neutral by construction. Tracing tools and agent-governance toolkits keep their own records, in their own systems, attested with their own keys — you verify by trusting them. A VDR is checked against public infrastructure, never a vendor's word, ours included: a signature proves who; a public anchor proves what and when. The format is domain-agnostic and portable, and only the digest is ever published — the decision payload never leaves your environment.
The record format is an open specification — the Verifiable Decision Record (CC-BY-4.0). It defines the canonical form (RFC 8785 / JCS), the digests, the optional public-log anchoring, and a verification procedure any vendor can implement independently. determs (this repo) is the reference implementation.
Conformance test vectors pin the exact digests a conforming implementation must reproduce — so independent implementations interoperate, not just ours.
A neutral registry index of anchored
record_digests is published as a static file — for discovery, never trust:
each entry self-verifies against its anchor, and only digests are indexed,
never the subject.
An evidence pack bundles anchored records for
one system over one period and maps them to a regulation's record-keeping duties
(EU AI Act Art. 12 & 19) — a self-verifying artefact an auditor checks against
public infrastructure, not against your word or ours. It is the open, trustless
primitive of the compliance layer (determs.compliance).
SDK (Python) Engine (Rust) Spec
─────────── ───────────── ────
wrap(client) ──→ capture → VDR record docs/spec/
emits a subject verify → recompute digests (vendor-neutral,
(the action) replay → rebuild & compare CC-BY-4.0)
src/— Rust engine + CLI (Apache-2.0)sdk/python/— Python SDK: Anthropic & OpenAI wrappers, sync/async, streamingdocs/spec/— the Verifiable Decision Record specificationexamples/— sample records and runnable demos
Pre-1.0. The engine, the agent.action.replay.v1 capsule, the CLI
(capture/verify/replay), and the Python SDK are working today. The VDR
spec is at draft v0, with a neutral registry index and an open, self-verifying
compliance evidence pack (EU AI Act Art. 12 & 19) alongside it. A managed
evidence layer — hosted retention and auditor-ready export over those same
packs — is the active commercial build. See ROADMAP.md.
- Agent governance prevents. It can't prove. — prevention and proof are different layers; you need both.
- Observability shows you what happened. It can't prove it. — why tracing tools (Langfuse, Helicone, Arize) aren't an audit layer.
- The EU AI Act wants logs you can prove — Articles 12 & 19, and the mapping to a verifiable record.
- Logs are not proofs — the case for deterministic replay in AI agents.
Open-core. The specification and the reference implementation (engine, CLI, SDK) are open source:
- Code — Apache-2.0
- Specification (
docs/spec/) — CC-BY-4.0 - A managed registry and a compliance layer are the commercial surface.
- CONTRIBUTING.md — best-effort, no CLA (inbound = outbound).
- SECURITY.md — report integrity/verification issues privately.
The one invariant: verification must depend on maths and public infrastructure, never on trusting a brand. Contributions that break that are out of scope.