SNARK

In the previous lecture, we have discussed interactive proofs (IP) in general. Now, we will mostly be talking about non-interactive proofs, in particular SNARKs.

A SNARK stands for a succinct proof that a certain statement is true. Succinct here is meaning that the proof is "short". For example, I have a statement:

I know an such that .

In SNARK, the proof should be short and fast to verify. A trivial proof of the above statement is to simply send to the verifier. However, that proof is not short; it is as big as . Verification is also not fast, as the verifier has to hash the entire message to actually see the proof.

A SNARK can have a proof size of few KBs and verification should take at most seconds.

zk-SNARK

In the case of a zk-SNARK, the proof reveals nothing about . zk-SNARKs have many applications:

  • Private transactions: Tornado Cash, ZCash, IronFish, Aleo (private dApps).
  • Compliance: private proofs of solvency & compliance, zero-knowledge taxes
  • Scalability: Rollup systems with validity proofs
  • and a lot more commercial interest…

Why is there so much commercial interest? Well, things go back to a paper [Babai-Fortnow-Levin-Szegedy'91] where they show that a single reliable PC can monitor the operation of a herd of supercomputers running unrealiable software.

This single reliable PC is a slow and expensive computer. An example of such a machine from today is actually a L1 Blockchain!

Blockchain Applications

SNARKs and zk-SNARKs can be used in many ways within a blockchain.

  • Outsourcing Computation: Even without the need of zero-knowledge, an L1-chain can quickly verify that some off-chain service has done some work correctly.
  • Scaling with Proof-based Rollups (zkRollup): An off-chain service processes batch of transactions, and the L1-chain verifies a succinct proof that transactions were processed correctly.
  • Bridging Blockchains (zkBridge): A source chain locks up some asset, so that it can be used in another destination chain. In doing so, it proves that indeed the asset is locked.
  • Private Transactions: A ZKP that a private transaction is valid. Many examples: TornadoCash, ZCash, Ironfish, Aleo.
  • Compliance: A proof that a private transaction is compliant with banking laws (e.g. Espresso), or a proof that an exchange is solvent without showing the assets (e.g. Raposa).

Non-Blockchain Applications

Blockchain is really spearheading the development in these areas, but there are many non-blockchain applications of SNARKs too. Here is one example: proving that a photo is taken at some time and at some place.

The initial attempts to this was made via C2PA, a standard of content provenance. With C2PA, the camera that captures the picture will sign its metadata with a secret key (that is assumed not to be extractable by a user). The signed metadata will ensure that the location and timestamp is valid.

However, newspapers and press that display a picture often have to apply some post-processing, such as rescaling, cropping and gray-scaling. There is actually a list of allowed operations by Associated Press. Doing any of these will break the signature, as it can be though of as tampering.

Here is the solution by [Kang-Hashimoto-Stoica-Sun'22] to this problem using zk-SNARKs: suppose that the machine that is doing the post-processing has the photo and some list of allowed operations . Denote the original photo as and as the signature. The editing software will attach a proof of claim: "I know a pair " such that:

  • is a valid C2PA signature on
  • is the result of applying to
  • Metadata of is equal to metadata of

Et viola, you can now be sure that the processed photo is original! The proof size is less than 1KB, with verification time less than 10ms in browser. For images around 6000x4000, just a few minutes is enough to generate a proof, which is awesome.


Arithmetic Circuits

Definition: Fix a finite field for some prime . A finite field is just a set of numbers where we can do addition and multiplication in modulo . An arithmetic circuit is a DAG (directed acyclic graph) where internal nodes are labeled and inputs are labeled . The circuit defines an -variate polynomial with an evaluation recipe.

Here is an example:

flowchart LR
	1((1)) --> -
	x2((x2)) --> -
	1((1)) --> +
	x1((x1)) --> +
	x2((x2)) --> +
	+ --> x
	- --> x
	x1 --> x
	x --> r(( ))

This circuit defines the operation .

For convenience, the size of the circuit refers to the number of gates, and is denoted as . In the example above, .

For example:

  • You can implement a circuit that does . This outputs 0 if is the preimage of using SHA256, and something other than 0 otherwise. This circuit uses around 20K gates, which is not bad!
  • You can have a that outputs 0 if is a valid ECDSA signature on with respect to .
  • A theorem states that all polynomial time algorithms can be captured by polynomial sized arithmetic circuits!

Structured vs. Unstructured

There are two types of arithmetic circuits in terms of structure:

  • Unstructured Circuit is a circuit with arbitrary wires.
  • Structured Circuit is a circuit with repeating structure within, denoted as , and for some input and output the flow of this arithmetic circuit looks like the following:

is often called a virtual machine (VM), and every step of execution in this structure can be thought of as a single step of some processor, or like a clock cycle.

Some SNARK techniques only apply to structured circuits.

NARK: Non-interactive Argument of Knowledge

Suppose you have some public arithmetic circuit where,

  • is a public statement
  • is a secret witness

Denote as a preprocessing step for this circuit, which will output a pair as in public parameters prover and verifier respectively.

sequenceDiagram
	actor P as Prover P(pp, x, w)
	actor V as Verifier V(vp, x)
	P ->> V: proof π that C(x,w) = 0
	note over V: accept or reject

Notice that the Verifier does not talk back to Prover, i.e. it does not interact with it! It just reads the generated proof and that's all, making the entire thing non-interactive.

More formally, a NARK is a triple :

  • is the preprocessing setup, generating public parameters for prover and verifier
  • is the prover function, generating the proof given the public prover parameters, public inputs and the secret inputs (witness).
  • is the verification function, either accepting or rejecting a given proof , along with the circuit's public verifier parameters and public inputs.

A technical point to be made here is that, all of these algorithms and the adversary are assumed to have an access to a random oracle. This is most likely due to Fiat-Shamir Paradigm we have learned in the previous lecture, but we will get to more details of this later.

Requirements of a NARK

There are 2 requirements of a NARK, and an optional 3rd requirement of zero-knowledgeness:

  • Completeness: If the prover does indeed knows the argued knowledge, verifier should definitely accept the proof.

  • Soundness: If the verifier accepts a proof, the prover should indeed know the argued knowledge. "Knowing" something is rather interesting to capture formally, but for now let's say there is an extractor algorithm that can extract a valid from the prover.

  • Zero-knowledge (optional): The view of this interaction, consisting of "reveals nothing new" of .

Trivial NARK

We can easily think of a trivial NARK, that is not zero-knowledge but has the other two properties. Simply, the proof . Yeah, just send the witness to the verifier! All the Verifier has to do next is check if , since both and were public anyways.

SNARK: Succinct NARK

We will introduce some constraints over the proof size and verification time, giving us two types of NARKs:

  • succint preprocessing NARK
  • strongly succinct preprocessing NARK

Let us see the first one.

A succinct preprocessing NARK is a triple :

  • is the preprocessing step, generating public parameters for prover and verifier
  • is the prover function, where . So, the proof length must be less than linear in the size of witness.
  • is the verification function, where . Note that the verification will have to read the public inputs, so it is allowed to run in time, but it must run sub-linear in the size of circuit .

In practice, we are even more greedy than this, so we have a much better and efficient type of NARK.

A strongly succinct preprocessing NARK is a triple :

  • is the preprocessing step, generating public parameters for prover and verifier
  • is the prover function, where . The proof length must be logarithmic to the size of circuit, making the proof very tiny compared to the circuit!
  • is the verification function, where . Again, the verification will have to read the public inputs, so it is allowed to run in time, but it will not have time to read the entire circuit, which is quite magical. This is actually what public parameter is for, it is capturing a summary of the circuit for the verifier so that is enough to run the verification.

A zk-SNARK is simply a SNARK proof that reveals nothing about the witness .

Trivial SNARK?

Let us again come back to the trivial proof, where .

  • Prover sends to the verifier.
  • Verifier checks if

Why can't there be a trivial SNARK? Well, there may be several reasons:

  • If is long, the proof size will be too large.
  • If is taking lots of time, the verifier time will be too long.
  • Naturally, we might want to keep secret.

Preprocessing Setup

We said that a preprocessing setup is done for a circuit . Things are actually a bit more detailed than this, there are 3 types of setups:

  1. Trusted Setup per Circuit: is a randomized algorithm. The random is calculated per circuit, and must be kept secret from the prover; if a prover can learn then they can prove false statements!
  2. Trusted Setup & Universal (Updatable): a random is only chosen once and is independent of the circuit. The setup phase is split in two parts: .
    1. is a one-time setup, done in a trusted environment. must be kept secret!
    2. is done for each circuit, and nothing here is secret! Furthermore, is a deterministic algorithm.
  3. Transparent: does not use any secret data, meaning that a trusted setup is not required.

SNARKs in Practice

Notice that we had no constraints on the proof generation time. In the recent years, prover time has been reduced to be in linear time with the size of circuit , and this kind of enabled the SNARK revolution we are seeing these past few years.

We will go into details of 4 categories of SNARKs throughout the lecture, and again all of these have provers that run in linear time .

Size of proof Verifier timeSetupPost-Quantum?
Groth16~200 B ~1.5 ms Truster per CircuitNo
Plonk / Marlin~400 B ~3 ms Universal trusted setupNo
Bulletproofs~1.5 KB ~3 sec TransparentNo
STARK~100 KB ~10 ms TransparentYes

The approximations here are made for a circuit that is size gates. There are many more SNARKs out there, but these four are the ones we will go into detail of.

Also note that some of these SNARKs have constant sized proofs and constant time verifiers, which is kind of awesome considering that no matter what your circuit is, the proof size and verifier time will be approximately the same!

Knowledge Soundness

While describing the properties of a NARK, we mentioned soundness:

Well, what does it mean to "know" here? Informally, knows if this can be somehow extracted from the prover . The way we do that is kind of torturing the until it spits out . Let us give the formal definition now.

Formally, an argument system is (adaptively) knowledge-sound for some circuit , if for every polynomial time adversary such that:

  • for some non-negligible

there is an efficient extractor that uses as a black box (oracle) such that:

  • for some negligible and non-negligible .

In other words, the probability that a prover can convince the verifier for some witness must be at most negligibly different than the probability that this witness is a valid witness for the circuit . In doing so, this witness must be extractable by the efficient extractor .


Building a SNARK

There are various paradigms on building SNARKs, but the general paradigm is made up of two steps:

  1. A functional commitment scheme, which is a cryptographic object
  2. A suitable interactive oracle proof (IOP), which is an information theoretic object

(1) Functional Commitment Scheme

Well, first, what is a commitment scheme? A cryptographic commitment is like a physical-world envelope. For instance, Bob can put a data in an envelope, and when Alice receives this envelope she can be sure that Bob has committed to whatever value is in it. Alice can later reveal that value.

The commitment scheme has two algorithms:

  • for some randomly chosen

The scheme must have the following properties:

  • Binding: cannot produce two valid openings for
  • Hiding: reveals nothing about the committed data

There is a standard construction using hash functions. Fix a hash function where

Committing to a function

Choose a family of functions . The function can be an arithmetic circuit, a C program, whatever; but what does it really mean to commit to a function? Well, consider the following interaction:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: f ∈ F
	P ->> V: com_f := commit(f, r)
	V ->> P: x ∈ X
	P ->> V: y ∈ Y, proof π
	note over V: Accept or Reject

Here, the proof is to show that and . Also, is a commitment to function , but we may also use the notation to indicate a commitment (think of it like in an envelope).

Formally, a functional commitment scheme for is defined by the following:

  • outputs public parameters .
  • is commitment to with .
    • this should be a binding scheme
    • optionally, it can be hiding, which is good for a zk-SNARK
  • with a prover and verifier , for a given and :
    • (a short proof!),
    • Basically, the system is a SNARK proof wof the relations: and and .

Four Important Functional Commitments

There are 4 very important functional commitment types:

  • Polynomial Commitments: Committing to a univariate polynomial where that fancy notation stands for the set of all univariate polynomials of degree at most .
  • Multilinear Commitments: Committing to a multilinear polynomial in which is the set of all the multilinear polynomials in at most variables, each variable with degree at most 1. Here is an example multilinear polynomial: .
  • Vector Commitments: Committing to a vector which is a vector of elements. With our commitment, we would like to be able to open any cell at a later time, such that . An example vector commitment scheme is Merkle Trees, which you may have heard of!
  • Inner-Product Commitments: Committing to a vector . Opening an inner product is done as . These are also known as Linear Product Arguments.

It turns out that for these 4 functional commitments, you can obtain any of these from any other.

Polynomial Commitment Scheme (PCS)

A PCS is a functional commitment for the family .

  • The prover commits to a univariate polynomial , later, they can prove that:
    • for some public
    • . As this is a SNARK, the proof size and verifier time should be .
  • The verifier has access to .

There are some example PCSs with different mechanisms:

  • Using basic elliptic curves: Bulletproofs (short proof, but verifier time is )
  • Using bilinear groups: KZG'10 (trusted setup), Dory'20 (transparent)
  • Using groups of unknown order: Dark'20
  • Using hash functions only: based on FRI (long eval proofs)

Trivial Commitment is bad!

What would be a trivial commitment scheme for a polynomial? Well, first realize that a polynomial is just . Then, our commitment will be:

  • as in simply hashing the coefficients and some random variable.
  • will be done as follows:
    • prover will send to the verifier
    • verifier will construct from the coefficients, and check if and .

This is problematic because the proof size and verification time are linear in , but we wanted a lot smaller values, such as .

Zero Test: A Useful Observation

We will now make a really useful observation, which is an essential part of SNARKs and is really what makes SNARKs possible!

Consider some non-zero polynomial with degree at most , shown as .

  • for it holds that

We know that has at most roots. is chosen at random from a size , do the probability that "hits" a root value is easy to see that .

Now suppose that and . Then, is negligible! So it is really unlikely that a randomly chosen field element will be the root for .

With this in mind, if you do get for then is identically zero with very high probability. This gives you a simple zero test for a committed polynomial!

[Schwartz-Zippel-DeMillo-Lipton] lemma states that this observation holds for multivariate polynomials too, where is treated as the total degree of . The total degree is calculated as the sum of degrees of all variables, for example has degree .

Equality Test: A Following Observation

Following the zero-test observation, we can make another observation that allows to check if two polynomials are equal.

Let . For , if then with very high probability! This comes from the observation above, where if then with very high probability.

Here is an interactive protocol that makes use of this equality test, where a prover can commit to polynomials and show that they are equal:

sequenceDiagram
	actor P as Prover(f, com_f, g, com_g)
	actor V as Verifier(com_f, com_g)

	note over V: r ← R (random value)
  V ->> P: r
	note over P: y ← f(r)
	note over P: π_f := proof that y = f(r)
  note over P: y' ← g(r)
	note over P: π_g := proof that y' = g(r)
	P ->> V: y, π_f, y', π_g
	note over V: accept if π_f, π_g are valid and y=y'
	note over V: reject otherwise

That's cool and all, but wait, we talked about non-interactiveness the entire lecture; why are we making an interactive protocol right now? Well, here is the only interaction that a verifier makes to the prover. It is a "public coin", just some coin-toss (or a series of tosses) given by the verifier to the prover for everyone to see.

Thanks to Fiat-Shamir Transform, we can transform interactive protocols of this nature into non-interactive proofs! More specifically, Fiat-Shamir Transform can take a "public-coin interactive protocol" which means all verifier randomness is public, and transform it into a non-interactive protocol.

To be technical, Fiat-Shamir Transform isn't safe to transform ALL interactive proofs of this nature, but it is good enough for our needs right now.

Let be a hash function. For the example above, the prover will generate and this will be used as the random challenge. Since the verifier also has access to they can generate the same during verification. That is how the interactiveness is removed!

If is modeled as a random-oracle, and is negligible (as we discussed in zero-test), then this is a SNARK. In practice, SHA256 is used for the hash function.

Is this a zk-SNARK? No, because the verifier learns the result of evaluating the polynomials at point .

(2) - Interactive Oracle Proof

The goal of an -IOP is to boost a commitment to function to become a SNARK for general circuits. For example, you could have a polynomial function family and with -IOP you can turn that into a SNARK for any circuit with size .

Definition: Let be some arithmetic circuit. Let . An -IOP is then a proof system that proves . In other words, some prover knows a witness .

  • Setup: where which are oracles of functions in the function family. These oracles can be though of as function commitments, where the verifier can ask to reveal a function result at some given value, equivalent to making an oracle request. Remember that we have seen setup procedures back when we discussed SNARKs!
  • Interactive Proof: Proving that .
sequenceDiagram
	actor P as Prover P(pp, x, w)
	actor V as Verifier V(vp, x)

	note over P, V: claim: C(x, w) = 0
	loop i = 1, 2, ..., (t-1)
	P ->> V: oracle [f_i ∈ F]
	note over V: r_i ← F_p
	V ->> P: r_i
	end

	P ->> V: oracle [f_t ∈ F]
	note over V: verify^(f_{-s}, ..., f_t)(x, r_1, ..., r_(t-1))

Let's digest what is happening in this interactive proof:

  1. The prover starts by sending an oracle for function . In practice, this is a commitment to function which we may show as .
  2. The verifier samples a uniformly random field element , and sends it back to the prover.
  3. Steps 1 and 2 happen for rounds.
  4. Finally, the prover sends one last oracle , an oracle for function .
  5. The verifier starts the verification process. This process has access to all oracles given by the prover, as well as all the generated randomness, and public inputs as well.

The IOP must have the following properties:

  • Completeness: If there exists a valid witness known by the prover, the verifier should definitely accept the proof.

  • Soundness: The second property is knowledge soundness (unconditional), meaning that a malicious prover can not convince a verifier that they know a witness such that . The way to prove that is using an extractor: this extractor is given the statement and functions in clear! Why in clear? Because, the commitments to those functions were SNARKs too and the extractor can extract the functions themselves from there too. The extractor must extract the witness from this process.
  • Zero-knowledge (optional): The view of this IOP "reveals nothing new" of .

Example: Polynomial IOP for claim

We will go over an example where the public input is a set, and secret witness is a set that contains or is equal to . Furthermore, is a subset of or equal to the finite field for prime . Suppose we capture this relation with a circuit such that the claim becomes:

sequenceDiagram
	actor P as Prover P(pp, X, W)
	actor V as Verifier V(vp, X)

	note over P: compute f(Z), g(Z)
	note over V: compute g(Z)
	note over P: q(Z) := f / g
	P ->> V: oracles [f], [q]
	note over V: r ← F_p
	V ->> P: r
	note over V: query w ← f(r)
	note over V: query q' ← q(r)
	note over V: compute x ← g(r)
	note over V: accept if x * q' = w

I will explain how and are computed here, let's dig in.

  1. Prover computes two polynomials and , a polynomial with roots in and respectively. It does that by the following:
  2. Verifier computes the same way, because is public.
  3. Prover computes a quotient polynomial , which is a polynomial that is the result of dividing by . This is only a polynomial if has all the roots that has, and that implies . That is a key point to understand in this proof. Let me give an example: and . Then, 1. 2. 3. is a valid polynomial in the finite field!
  4. Prover sends oracles of and to the verifier. In practice, it uses a polynomial commitment scheme and sends commitments to these functions, and .
  5. Verifier samples a uniform random variable from the finite field. It sends this to the prover, but the prover makes no use of it. There is a reason why it sends anyways: it is to make public! If the verifier didn't send , we would not be able to say that it is publicly known.
  6. Verifier queries the value of and at point , and also computes . Denote these as respectively. Note that this querying happens via the polynomial commitment scheme in practice, so behind the scenes verifier sends to the prover, the prover evaluates it and sends back the result along with a proof that it is evaluated correctly and so on, check the previous section for this.
  7. Verifier checks if . This is only possible if indeed, think of it like checking in this example.

Replacing the oracles with commitments, and oracle queries with commitment interactions and so on is often called "compilation step", where this Poly-IOP is "compiled" into a SNARK by adding in the PCS (poly-commitment scheme) steps.

The IOP Zoo

There are many SNARKs for general circuits.

IOPCommitment SchemeExamples
Poly-IOPPoly-CommitSonic, Marlin, Plonk
Multilinear-IOPMultilinear-CommitSpartan, Clover, Hyperplonk
Vector-IOPVector-Commit (e.g Merkle)STARK, Breakdown, Orion

You construct the IOP, and use the relevant commitment scheme to do the commitments and queries; et viola, you have a SNARK. However, the examples we have were interactive so far, but a SNARK is non-interactive. To do that final touch, you use the Fiat-Shamir Transform (using hash functions) to make the entire thing non-interactive.

SNARKs in Practice

flowchart LR
	D[DSL] -- compiler --> S[SNARK-friendly format]
	S -- pp, vp --> B[SNARK backend prover]
	X[x, witness] --> B
	B --> proof

In practice, you wouldn't want to write the entire circuit yourself. We use DSLs (domain-specific languages) to do that for us.

Some DSL examples are:

The DSL compiles the circuits for you, and outputs a SNARK friendly format. There are several formats:

  • Circuit
  • R1CS (Rank-1 Constraint System)
  • EVM Bytecode (yea, that is possible!)

Finally, with the public parameters , the public input and witness a SNARK backend prover generates a proof.

That is the end of this lecture!