# The Data Pipeline

This section explains the complete journey of data through the system from the moment private financial information enters to the moment a verifiable proof comes out.

### **The Big Picture**

At a high level, the system works like a secure processing factory. Raw financial data goes in one end, and a verifiable proof comes out the other. Along the way, every piece of data is validated, fingerprinted, organized, proven correct, and signed by hardware.

Each step builds on the one before it. There are no shortcuts and no way to skip a step. Let's walk through each one.<br>

<figure><img src="/files/eJJsXne53ffg5A6KN2lv" alt=""><figcaption></figcaption></figure>

### **Step 1: Data Arrives**

The pipeline begins when financial data is submitted — typically reserves held at various custodians and liabilities owed to users. Each record includes:

* **Who** holds it (a custodian or counterparty identifier)
* **What** asset it is
* **How much** (a USD value)
* **When** the value was recorded (a timestamp)

This is the raw, private data. Individual custodian holdings, specific asset breakdowns — information that needs to remain confidential.

### **Step 2: Validation**

Before any cryptographic processing begins, every piece of data goes through strict validation. This is the system's first line of defense against bad data.

**Why validation matters:** The system can only prove that computations were done correctly. It can't know if the input data itself is truthful. But it can catch data that is obviously wrong, stale, or malformed, preventing garbage from being attested.

The validation checks include:

* **Structure** — Does each record have all required fields? Are the types correct?
* **Reasonableness** — Are values within sane bounds? A negative reserve amount or a $999 quadrillion liability would be rejected.
* **Freshness** — Is the data recent? Records older than 12 hours are rejected to prevent stale data from being attested as current. This is important because financial positions change, and a proof should reflect recent reality.
* **Capacity** — Does the data fit within the system's processing limits? The cryptographic circuits have fixed capacities, and the data must fit within them.

If any check fails, the entire request is rejected. There are no partial proofs — either everything is valid, or nothing gets processed.

### **Step 3: Salt Derivation**

Each value needs a unique secret called a **salt** before it can be committed. Think of a salt as a secret ingredient mixed into a fingerprint. It ensures that even if two custodians hold the exact same amount, their fingerprints look completely different.

**Why salts matter:** Without salts, an attacker who knows the possible values (say, round dollar amounts) could try hashing each possibility until they find a match. Salts make this impossible. You'd need to guess both the value and its unique salt, which is computationally infeasible.

The salts are generated using a technique called **HMAC** (Hash-based Message Authentication Code). The important properties:

* **Deterministic** — the same data always produces the same salt. This means proofs are reproducible.
* **Unique** — each field gets its own salt based on its identifier. No two fields share a salt.
* **Derived from a master secret** — a single private key generates all salts. The master secret never leaves the secure enclave.
* **One-way** — knowing one salt tells you nothing about any other salt or the master secret.

### **Step 4: Value Commitment**

Now each value is combined with its salt to produce a **commitment** — a cryptographic fingerprint.

A commitment is a one-way operation. Given a value and its salt, anyone can compute the commitment and verify it matches. But given only the commitment, it's impossible to work backward to find the value.

**Everyday analogy:** Imagine writing a number on a piece of paper, putting it in a locked box, and publishing a photo of the locked box. Everyone can see the box exists, but no one can see the number. Later, you can open the box and everyone can verify the number was always there.

The system actually creates two fingerprints for each value using different methods — one optimized for compatibility with Ethereum smart contracts (keccak256), and another optimized for efficiency in zero-knowledge proof systems (Poseidon). Both represent the same value; they just serve different verification purposes.

### **Step 5: Merkle Tree**

All the individual commitments are organized into a structure called a **Merkle tree**. This is a way of combining many fingerprints into a single, compact **root hash** that represents all of them.

The key property: if any single commitment changes, the root hash changes completely. This makes the Merkle root a tamper-evident seal over all the data.

Think of it like a chain of custody form. If one item on the form is altered, the overall checksum changes and everyone knows something was tampered with.

See [Merkle Trees, Commitments & Chaining](/proof-of-reserve-network/merkle-trees-commitments-and-chaining.md) for a deeper explanation of how this works.

### **Step 6: Timeseries Chaining**

Before the root is finalized, it's linked to the previous proof's root. This creates a chain where each proof depends on the one before it, similar to how blocks in a blockchain reference the previous block.

**Why chaining matters:** Without it, someone could replace an old proof with a fabricated one, and no one would notice. With chaining, modifying any historical proof breaks the chain from that point forward, making tampering immediately detectable.

See [Merkle Trees, Commitments & Chaining](/proof-of-reserve-network/merkle-trees-commitments-and-chaining.md) for more detail.

### **Step 7: TEE Attestation**

This is where the hardware steps in. The Merkle root, along with the computed totals and metadata, is **attested** by the Trusted Execution Environment (TEE).

Attestation means the hardware itself produces a signed document saying: "I am genuine hardware, running this specific code, and the computation produced this exact result." The signature traces all the way back to a trusted certificate authority.

This is the strongest guarantee in the system. It's not a software signature that could be faked by a compromised server. It's a hardware signature that can only come from a genuine, unmodified enclave.

See [Trusted Execution Environments](/proof-of-reserve-network/trusted-execution-environments-tee.md) for the full explanation.

### **Step 8: Zero-Knowledge Proofs**

As an additional layer of security, the system generates **zero-knowledge proofs** (ZK proofs) that mathematically verify the computations are correct.

Two types of proofs are produced:

1. **Merkle root proof** — proves the root hash was correctly computed from all the commitments
2. **Sum proofs** — proves that the individual reserve values add up to the published total reserves (and the same for liabilities), without revealing what the individual values are

These proofs are independent of the hardware attestation. Even if you don't trust the hardware, the math still holds. And even if you're skeptical of the math, the hardware attestation provides an independent guarantee.

See [Zero-Knowledge Proofs](/proof-of-reserve-network/zero-knowledge-proofs.md) for more.

### **Step 9: Payload Assembly**

Finally, everything is bundled into a single JSON document, which is the **proof payload**. This payload is completely self-contained. It includes:

* The proof metadata (when it was generated, what data feed it covers)
* The verified totals (total reserves, total liabilities)
* The Merkle root and all commitments
* The hardware attestation document
* The zero-knowledge proofs
* All certificates and verification keys needed to check everything

A verifier doesn't need to contact any external service. They don't need special access or credentials. Everything needed to independently verify the proof is right there in the payload.

### **Why This Design?**

Several design choices make this pipeline particularly robust:

**Sequential processing** — each step depends on the previous step's output. This means errors are caught early and can't propagate.

**Redundant verification** — hardware attestation and ZK proofs provide two independent guarantees. Breaking one doesn't break the other.

**Self-contained output** — the proof payload includes everything needed for verification. No external dependencies, no trust in third-party services.

**Privacy by default** — individual values are never exposed. Only commitments and aggregate totals are published. A verifier can confirm the totals are correct without ever seeing the breakdown.

**Historical continuity** — time-series chaining ensures the proof history can't be rewritten. Each proof is anchored to the entire history before it.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.afiprotocol.xyz/proof-of-reserve-network/the-data-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.