stacked_drg
Stacked DRG PoRep
This section describes Stacked DRG PoRep (SDR), the specific Proof-of-Replication (PoRep) used in Filecoin. In this construction, the prover encodes the original data into a replica and commits to it. An offline PoRep proves that the commitment to the replica is a valid commitment of the encoded original data.
SDR has been presented by Ben Fisch at EUROCRYPT19.
Introduction
Background on Proof-of-Replication
Proof-of-Replication enables a prover P to convince a verifier V that P is storing a replica R, a physically independent copy of some data D, unique to P. The scheme is defined by a tuple of polynomial time algorithms (Setup, Replication, Prove, Verify). The assumption is that generation of a replica after Replicate must be difficult (if not impossible) to generate.
Setup: On setup, the public parameters of the proving systems are set.
Replicate: On replication, either a party or both (depending on the scheme, in our case the prover only!) generate a unique permutation of the original data D, which we call replica R.
Prove: On receiving a challenge, the prover must generate a proof that it is in possession of the replica and that it was derived from data D. The prover must only be able to respond to the challenge successfully if it is in possession of the replica, since would be difficult (if not impossible) to generate a replica that can be used to generate the proof at this stage
Verify: On receiving the proof, the verifier checks the validity of the proof and accepts or rejects the proof.
{
} sequenceDiagram Note right of Prover: CommD Prover-->>Prover: R, CommR ← Replicate(D) Prover->>Verifier: CommR Verifier-->>Verifier: Generate random challenge Verifier->>Prover: challenge Prover-->>Prover: proof ← Prove(D, R, challenge) Prover->>Verifier: proof {
}
Time-bounded Proof-of-Replication
Timing assumption. Time-bounded Proof-of-Replication are constructions of PoRep with timing assumptions. The assumption is that generation of the replica (hence the Replication) takes some time t that is substantially larger than the time it takes to produce a proof (hence time(Prove)) and the round-trip time (RTT) for sending a challenge and receiving a proof.
Distinguishing Malicious provers. A malicious prover that does not have R, must obtain it (or generate it), before the Prove step. A verifier can distinguish an honest prover from a malicious prover, since the malicious one will take too long to answer the challenge. A verifier will reject if receiving the proof from the prover takes longer than a timeout (bounded between proving time and replication time).
Background on Stacked DRG PoRep
Stacked DRG PoRep (SDR) is a specific Proof-of-Replication construction that we use in Filecoin. SDR has been designed by Ben Fisch at EUROCRYPT19. At a high level, SDR ensures that the Replicate step is a slow non-parallelizable sequential process by using a special type of graph called Depth Robust Graphs (DRG).
Encoding using DRGs. A key is generated by sequentially labeling nodes in the graph such that each label depends on the labels of its parents. The depth robustness property of these graphs ensure that the sequential labeling steps are not parallelizable. The final labels are used as a key to encode the original data.
TODO: This probably needs a more thorough rewrite.
Stacked DRGs. SDR builds on the above by stacking DRG graphs into LAYERS
layers. Each layer is connected to the previous by a Bipartite Expander Graph. The combination of DRGs and expander graphs guarantee the security property of PoRep. As before, the key produced by the final layer is used to encode the original data, yielding the replica.
Generating SDR proofs. Given the following public parameters:
ReplicaId
is a unique replica identifier (see the Filecoin Proofs spec for details).CommD
is the Merkle tree root hash of the input data to the first layer.CommC
is the Merkle tree root hash of the SDR column commitments.CommRLast
is the Merkle tree root hash of the replica.CommR
is the on-chain commitment to the replica, dervied as the hash of the concatenation ofCommC
andCommRLast
.
An SDR proof proves that some data whose committment is CommD
has been used to run a Replicate
algorithm and generated some data. CommR
is the on-chain commitment to both the replicated data and to intermediate stages required to prove Replicate
was performed correctly.
An SDR proof consists of a set of challenged DRG nodes for each layer, a set of parent nodes for each challenged node and a Merkle tree inclusion proof for each node provided. The verifier can then verify the correct labeling of each node and that the nodes given were consistent with the prover's commitments.
Making proofs succinct with SNARKs: The proof size in SDR is too large for blockchain usage (~100MB TODO: check this), mostly due to the large amount of Merkle tree inclusion proofs required to achieve security. We use SNARKs to generate a proof of knowledge of a correct SDR proof. In other words, we implement the SDR proof verification algorithm in an arithmetic circuit and use SNARKs to prove that it was evaluated correctly.
The SNARK circuit proves that given Merkle roots CommD
, and CommR
, the prover correctly derived the labels at each layer and correctly performed the final encoding.
PoRep in Filecoin
Proof-of-Replication proves that a Storage Miner is dedicating unique storage for each sector. Filecoin Storage Miners collect new clients' data in a sector, run a slow encoding process (called Seal
) and generate a proof (SealProof
) that the encoding was generated correctly.
In Filecoin, PoRep provides two guarantees: (1) space-hardness: Storage Miners cannot lie about the amount of space they are dedicating to Filecoin in order to gain more power in the consensus; (2) replication: Storage Miners are dedicating unique storage for each copy of their clients data.
Glossary:
sector: a fixed-size block of data of
SECTOR_SIZE
bytes which generally contains clients' data.unsealed sector: a concrete representation (on disk or in memory) of a sector's that follows the "Storage Format" described in Client Data Processing (currently
paddedfr32v1
is the required default).sealed sector: a concrete representation (on disk or in memory) of the unique replica generated by
Seal
from an unsealed sector. A sector contains one or more pieces.piece: a block of data of at most
SECTOR_SIZE
bytes which is generally a client's file or part of.
Stacked DRG Construction
Public Parameters
The following public parameters are used in the Stacked DRG Replication and Proof Generation algorithms:
TODO: The Appendix should explain why we picked those values
Just interpolate a table of the Orient parameters and reconcile naming.
name | type | description | value |
|
| Number of nodes in the DRG in bytes |
|
|
| Number of Depth Robust Graph stacked layers. |
|
|
| In-Degree of each Depth Robust Graph. |
|
|
| Degree of each Bipartite Expander Graph to extend dependencies between layers. |
|
|
| Seed used for random number generation in |
|
|
| Size of each node in bytes. |
|
The following constants are computed from the public parameters:
name | type | description | computation | value |
|
| Total number of parent nodes |
|
|
|
| Number of nodes in the graph |
|
|
|
| Height of the Merkle Tree of a sector |
|
|
The following additional public parameters are required:
TAPER
:Float
: Fraction of each layer's challenges by which to reduce next-lowest layer's challenge count.TAPER_LAYERS
:uint
: Number of layers which should be tapered. FIXME: update for current tapering.Data
is a byte array initialized to the content of unsealed sector and will be mutated in-place by the replication process.
Hash Functions
We have describe three hash functions:
name | description | size of input | size of output | construction |
| Hash function used as a KDF to derive the key used to label a single node. | TODO |
|
|
| Hash function used to hash the labeled leaves of each layer (see SDR Column Commitments). | TODO |
|
|
| Collision Resistant Hash function used for the Merkle tree. | 2 x |
|
|
| Balanced binary Merkle tree based used to generate commitments to sealed sectors, unsealed sectors, piece commitments, and intermediate parts of the Proof-of-Replication. | TODO |
| Uses |
RepHash
RepHash
is a vector commitment used to generate commitments to sealed sectors, unsealed sectors, piece commitments and intermediate stepds of the Proof-of-Replication. Filecoin uses a balanced binary Merkle tree for RepHash
. The leaves of the Merkle tree are pairs of adjacent nodes.
RepHash
inputs MUST respect a valid Storage Format. (TODO: What does this mean?)
Stacked DRG Graph
The slow sequential encoding required is enforced by the depth robustness property of the SDR graph.
Encoding with SDR: The data from a sector (of size SECTOR_SIZE
) is divided in NODE_SIZE
nodes (for a total of GRAPH_SIZE
nodes) and arranged in a directed acyclic graph. The structure of the graph is used to label the nodes sequentially to generate a key with which to encode the original data: in order to label a node, its parents must be labeled (see the "Layer Labeling" section below). We repeat this process for LAYERS
layers, where the input to a next layer is the output of the previous one.
Generating the SDR graph: The SDR graph is divided in LAYERS
layers. Each layer is a directed acyclic graph and it combines a Depth Robust Graph (DRG) and a Bipartite Expander graph. TODO:
this isn't quite right.
We provide an algorithm (SDR
) which computes the parents of a node. In high level, the parents of a node are computed by combining two algorithms: some parents (BASE_DEGREE
of them) are computed via the BucketSample
algorithm extended with a direct ordering of nodes, others (EXPANSION_DEGREE
of them) are computed via the Chung
algorithm.
SDRGraph
: SDR Graph algorithm
SDRGraph
: SDR Graph algorithmOverview: Compute the DRG and Bipartite Expander parents using respectively BucketSample
and ChungExpander
.
Inputs
name | description | Type |
| The node for which the parents are being computed |
|
| The layer of the SDR graph |
|
Outputs
name | description | Type |
| The ordered parents of node |
|
Algorithm
If
layer = 1
:Compute
drgParents = BucketSample(node)
Set
parents
to bedrgParents
.
If
layer > 1
:Compute
drgParents = BucketSample(node)
Compute
expanderParents = ChungExpander(node)
Set
parents
to be the concatenation ofdrgParents
andexpanderParents
We provide below a more succinct representation of the algorithm:
TODO: Reference to code in filproofs/algorithms.go — or restructure all this.
Time-space tradeoff
Computing the parents using both BucketSample
and ChungExpander
for every layer can be an expensive operation, however, this can be avoided by caching the parents.
BucketSample
: Depth Robust Graphs algorithm
BucketSample
: Depth Robust Graphs algorithmThis section describes how to compute the "base parents" of the SDR graph, which is the equivalent of computing the parents of a Depth Robust Graph.
The properties of DRG graphs guarantee that a sector has been encoded with a slow, non-parallelizable process. We use the BucketSample
algorithm that is based on DRSample (ABH17) and described in FBGB18 and generates a directed acyclic graph of in-degree BASE_DEGREE
.
BucketSample
DRG graphs are random graphs that can be deterministically generated from a seed; different seeds lead with high probability to different graphs. In SDR, we use the same seed GRAPH_SEED
for each layer of the SDR graph such that they are all based on the same underlying DRG graph.
The parents of any node can be locally computed without computing the entire graph. We call the parents of a node calculated in this way base parents.
BucketSample
extends BucketSampleInner
to include the node's 'immediate predecessor'. Each node except the first in a DRG generated by BucketSample
has the node whose index is one less than its own as a parent. This ensures that visiting nodes whose indexes are sequential will result in a graph traversal in topological order.
ChungExpander
: Bipartite Expander Graphs
ChungExpander
: Bipartite Expander GraphsTODO: explain why we link nodes in the current layer
Each node in layers other than the first has EXPANSION_DEGREE
parents generated via the ChungExpander
algorithm. Note that the indexes returned refer to labels from the previous layer. TODO: Make this all clearer with explicit notation.
TODO: link to relevant filproofs/algorithms.go
Time-Space tradeoff
Computing these parents can be expensive (especially due to the hashing required by the Feistel algorithm). A miner can trade this computation by storing the expansion parents.
Feistel construction
We use three rounds of Feistel
to generate a permutation to compute the parents of the Bipartite Expander graph.
TODO: link to filproofs/feistel
Replication
The Replication phase turns an unsealed sector into a sealed sector by first generating a key, then using the key to encode the orignal data.
Before running the Replicate
algorithm, the prover must ensure that the sector is correctly formatted with a valid "Storage Format" described in Filecoin Client Data Processing (currently paddedfr32v1
is the required default).
TODO: inputs are missing
The Replication Algorithm proceeds as follows:
Calculate
ReplicaID
usingHash
(SHA256):
ReplicaID
is a 32-byte array constructed by hashing the concatenation of the following values:
ProverId
is a 32-byte array uniquely identifying a prover.SectorNumber
is an unsigned 64-bit integer in little-endian encoding represented as an 8-byte array.RandomSeed
is a 32-byte array of randomness extracted from the chain.CommD
is the Merkle root obtained by performingRepHash
on the original data represented inpaddedfr32v1
.
Perform
RepHash
onData
to yieldCommD
andTreeD
:
Layer Labeling
TODO: Define Graph
. We need to decide if this is an object we'll explicitly define or if its properties (e.g., GRAPH_SIZE
) are just part of the replication parameters and all the functions just refer to the same graphs being manipulated across the entire replication process. (At the moment I've avoided defining a Graph
structure as in other specs I didn't see any object methods, just standalone functions.)
TODO: link to filproofs/algorithms
Proof Generation
Overview:
Challenge Derivation
Proof Generation
Circuit Proof Generation
TODO: write a single algorithm which includes the spec below
Challenge Generation
TODO: Link to filproofs/algorithms
Calculate LAYER_CHALLENGES : [LAYERS]uint
: Number of challenges per layer. (This will be passed to the SDR circuit proof.)
Derive challenges for each layer (call DeriveChallenges()
).
Witness Generation
TODO: Link to filproofs/algorithms
Layer Challenge Counts
TODO: we should just list current parameters and show this as a calculation for correctness, this should not mandatory to implement.
Last updated