Welcome to my Cryptography Notes.

⬩

To learn is to teach - to teach is to learn, and to take notes.

⬩

Hopefully you can learn something from these notes, and please feel free to create an issue or a pull-request if you have suggestions.

⬩

Take care anon, and remember not to trust, but verify.

— erhant

Berkeley ZKP MOOC 2023 Spring

Summary of ZKP MOOC Spring 2023 lectures, which are a series of lectures given by Dr. Dan Boneh, Dr. Shafi Goldwasser, Dr. Dawn Song, Dr. Justin Thaler and Dr. Yupeng Zhang. This course covers fundamental techniques to build ZKP protocols, tools to implement ZKP for different computations, and different applications of ZKP in blockchain and other areas

There may be some repetitions among the lectures, but that is good and especially useful when you come back to skim over the notes.

Take a moment and look at that roster, it's like champions league up there. Make sure you watch the course videos on YouTube, and visit the course website.

Introduction & History: In the first lecture, we take a look at the starting point of all this What separates classical proof from an interactive proof, and how powerful are interactive proofs? What is zero-knowledge and how did it came to be? An excellent primer by Prof. Shafi Goldwasser.
Overview of Modern SNARKs: While the previous lecture was more about interactive proofs (IPs), this one is all about SNARKs. We learn what they are, and how to build them.
Programming ZKPs: Suppose you have an idea, and you write a program about it and you have to reduce it to some constraint system so that zero-knowledge cryptography can do it's thing. How can we do it? See this lecture for the answer.
SNARKs via IPs: This lecture is also about building SNARKs, with various methods! We will look at Merkle trees, Schwartz-Zippel lemma, Sum-check protocol and much more.
The PLONK SNARK: In this lecture we will first see KZG polynomial commitments, and then talk about useful proof-gadgets on committed polynomials. Finally, we learn PLONK.
Polynomial Commitments using Pairings & Discrete-log: We first look at some group theory background, then dive into KZG and Bulletproofs. A summary of pairing & d-log based polynomial commitments is also given in the end.
Polynomial Commitments using Error-correcting Codes: We look at error-correcting codes, Reed-Solomon Code in particular. Then, we describe a polynomial commitment scheme based on linear codes.

Interactive Proofs

When we think of proofs made by Euclid, Gauss, Euler and such, we think of proofs where we are trying to show that given some axioms and declarations, you can show that some claim is true. These are classical proofs.

There is a much more powerful proof system, called interactive proofs, we must think of proofs more interactively such as:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim
	P ->> V: proof
	note over V: accept/reject

Efficiently Verifiable Proofs (NP-Proofs)

For efficiently verifiable proofs (aka NP-Proofs) we have the following setting:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim = x
	note over P: works hard to generate proof w
  note over P: unbounded time
	P ->> V: short proof w s.t |w| = poly in |x|
	note over V: find V(x, w), takes poly time in |x|
	note over V: accept if it is 1, reject otherwise

Let us check some example claims.

Example: $N$ is a product of 2 large primes

A trivial proof of this would be the following:

Prover sends $p, q$ as a proof.
Verifier would check for $N = pq$ , and if true it will accept. In doing so, the Verifier learned about $p, q$ , which is not good for us!

Example: $y$ is a quadratic residue mod $N$

What this means is that, $\exists x \in Z_{N}^{*}$ such that $y \equiv x^{2} (mod N)$ . It turns out that finding such an $x$ is a very hard problem, as hard as factoring problem actually.

Even so, our prover is unbounded and can find such an $x$ in the end. A trivial proof for this would be:

Prover sends $x$ as a proof.
Verifier calculates $x^{2}$ and check if $y \equiv x^{2} (mod N)$ , accepts if true and rejects otherwise.

Example: Two graphs are isomorphic

Consider two graphs, as a set of edges $E_{1} = {(i_{1}, j_{1}), (i_{2}, j_{2}), \dots}$ and $E_{2} = {(p_{1}, q_{1}), (p_{2}, q_{2}), \dots}$ . We say that if there is an isomorphism $π$ that maps an edge $e_{1} \in E_{1}$ to $e_{2} \in E_{2}$ such that $\forall (i, j) : (i, j) \in E_{1} ⟺ (π (i), π (j)) \in E_{2}$ .

In layman terms, these two graphs are the same graph, but they are drawn a bit differently. Again, a trivial proof here would be to simply send the isomorphism itself to the Verifier. However, finding the proof itself is a very hard problem, although checking if it is correct is not!

NP-Languages

Definition: $L$ is an NP-language (or NP-decision problem) if there is a $p o l y (∣ x ∣)$ time verifier $V$ where:

Completeness: true claims have short proofs, meaning that if $x \in L$ then there is a $p o l y (∣ x ∣)$ sized witness $w \in {0, 1}^{*}$ such that $V (x, w) = 1$ .
Soundness: false theorems have no proofs, meaning that if $x \neq \in L$ then there is no witness, so $\forall w \in {0, 1}^{*}$ we have $V (x, w) = 0$ .

The main question of this course is not focused on this subject though. Instead, we are trying to see if there is another way to convince a verifier, for example considering the questions given above.

Zero-Knowledge Interactive Proofs

The main idea of ZKP, although very informally, is that a Prover "will prove that they can indeed prove the claim if they wanted to", but they are not doing that so that information is kept private.

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: proves "I could prove it if I felt like it"
  P ->> V: sends a ZKP
	note over V: wow, ok I believe that

We need to extend our proof model to work with ZK. We will need two more ingredients:

Interaction: rather than passively reading a proof, the verifier engages in a non-trivial interaction with the prover.
Randomness: verifier is randomized, i.e. coin tossing is allowed as a primitive operation. Furthermore, the verifier can have an error in accepting/rejecting but with a small negligible probability.

With this, our proof model becomes as follows:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim x
	loop i = 1, 2, ...
		P ->> V: a_i
		V ->> P: q_i
	end
	note over V: accept/reject

Here, the Prover is computationally unbounded but the Verifier must be efficient, i.e. runs in probabilistic polynomial-time (PPT).

Example: Two Colors

Consider a string x="🔴🔵". We claim that there are two colors in this string. However, our verifier is color-blind! How can we prove that indeed this string has two colors?

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim: x="🔴🔵" has 🔴 and 🔵
	loop i = 1, 2, ...
		P ->> V: x
		note over V: b ← {0, 1}
		alt b = 1
			note over V: x' := flip(x)
		else
			note over V: x' := x
		end
		V ->> P: x'
		alt x' = flip(x)
			note over P: b' := 1
		else
			note over P: b' := 0
		end
		P ->> V: b'
		note over V: if [b != b'] then reject
	end
	note over V: accept

Let us describe what happens in this interactive proof:

The prover sends x to the verifier, say $x$ = "🔴🔵".
The verifier tosses a coin, and flips $x$ if it is heads $(b = 1)$ meaning $x^{'}$ = "🔵🔴". Otherwise, $x$ stays the same, meaning $x^{'}$ = "🔴🔵". Verifier sends this $x^{'}$ to the prover.
Prover will then compare $x$ to $x^{'}$ , and since he can see colors, he will be able to tell whether $x^{'}$ is flipped or not. Based on this, he will say $b^{'}$ = heads or tails, depending on the flip. Prover sends $b^{'}$ to the verifier.
Verifier looks at $b^{'}$ , and says "wow this guy can actually guess whether I got heads or tails, he must be seeing colors then!". If $b^{'}$ and $b$ does not match, she rejects.
This interaction is repeated polynomially many times, until finally the Verifier accepts.

Let us analyze this formally. If there are 2 colors, then verifier will accept. If there is a single color only, for all provers $Pr [V accepts] \leq 1/2$ for a single interaction. Repeating this interaction $k$ times would mean that $Pr [V accepts] \leq 1/ 2^{k}$ which becomes a tiny probability for large $k$ . So, it is very unlikely that a prover can keep faking it for that many interactions.

Example: $y$ is a quadratic residue mod $N$

Here, the claim is the language ${(N, y) : \exists x s.t. y \equiv x^{2} (mod N)}$ .

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim: ∃x s.t. y ≡ x^2 (mod N)

	loop i = 1, 2, ...
		note over P: choose r where 1 ≤ r ≤ N and gcd(r,N)=1
		note over P: s := r^2 mod N
		P ->> V: s

		note over V: b ← {0, 1}
    V ->> P: b

	  alt b = 1
			note over P: z := r * sqrt(y) mod N
		else
			note over P: z := r
		end
		note over V: if [z^2 != s * y^c mod N] then reject
	end

	note over V: accept

So basically what the prover is doing is,

generates a random quadratic residue $s$
asks for the verifier to make a coin toss
- if heads $(b = 1)$ , it sends $z := sy$
- if tails $(b = 0)$ , it sends $z := s$
the verifier checks if $z^{2} \equiv s \times y^{c} (mod N)$ . if not, it rejects
after many interactions, if the verifier hasn't rejected so far, it accepts

Similar to the previous example, if the prover had not known $x$ , he would not be able to win many interactions.

So what made this whole thing possible? Let's recap:

The statement to be proven has many possible proofs of which the prover chooses one at random.
Each such proof is made up of exactly 2 parts: seeing either part on its own gives the verifier no knowledge; seeing both parts imply 100% correctness.
Verifier chooses at random which of the two parts of the proof he wants the prover to give him. The ability of the prover to provide either part, convinces the verifier.

Interactive Proofs for a Language $L$

Definition: $(P, V)$ is an interactive proof for $L$ , if $V$ is probability polynomial ( $∣ x ∣$ ) time, and:

Completeness: if $x \in L$ then $V$ always accepts. Formally: $Pr [(P, V) (x) = accept] \geq c$ .
Soundness: if $x \neq \in L$ then for all cheating prover strategies, $V$ will not accept except with negligible probability. Formally: for such cheater provers $P^{*}$ , it holds that $Pr [(P^{*}, V) (x) = accept] \leq s$
In a good scenario, we would except $c = 1$ and $s = n e g l (∣ x ∣)$ where $n e g l (λ)$ is a negligible function. However, we might also show that $c - s \geq n e g l (∣ x ∣)$ or equivalently: $c - s \geq 1/ p o l y (∣ x ∣)$ .

Definition: Class of languages IP = ${L for which there is an interactive proof}$ .

What is Zero-Knowledge?

For true statements, we want the following to be true: what the verifier could have computed before the interaction IS EQUAL TO what the verifier can compute after the interaction.

So basically, these interactions had no effect whatsoever on the computational power of the verifier! That is what zero-knowledge means. Now, let's be more formal.

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: theorem T
	loop i = 1, 2, ...
		P ->> V: a_i
		V ->> P: q_i
	end
	note over V: accept/reject

After an interactive proof, the verifier has learned:

that the claim/theorem is true
a view of interactions, i.e. the transcript of answers and queries, and the coins that $V$ has tossed.

A view is formally defined as a random variable from the probability distribution over coins of $V$ and interactions with $P$ :

$view_{V} (P, V) [x] = {(q_{1}, a_{1}, q_{2}, a_{2}, \dots, coin tosses of V}$

The Simulation Paradigm

$V$ 's view gives him nothing new, if he could have simulated that view on it's own. In other words, the simulated view and real view are computationally indistinguishable!

In cryptography, we have the notion of computational indistinguishability, where there are two distributions $D_{1}$ and $D_{2}$ , and a distinguisher that tries to distinguish these two. In doing so, he will sample both distributions at random, and will try to guess from which distribution that sample is taken from. If the probability of distinguishing this sample is at most $1/2 + negl$ then $D_{1}$ and $D_{2}$ are indistinguishable!

That is a rather heavy paragraph, so let me give an example. A vendor is selling watches, and they have a set of real Rolex watches, and a set of fake Rolex watches. You go to the shop, and ask for a sample. The vendor secretly flips a coin,

if heads, vendor gives you a real Rolex watch
if tails, vendor gives you a fake Rolex watch

You try to guess whether the flip was heads or tails. If the probability that you guess correctly is at most $1/2 + negl$ then the set of fake Rolex watches are indistinguishable from the set of real Rolex watches! Good for the vendor 🙂

Definition: An interactive protocol $(P, V)$ is honest-verifier zero-knowledge for a language $L$ if there exists a PPT algorithm $Sim$ (a simulator) such that $\forall x \in L$ the following two probability distributions are poly-time indistinguishable:

$view_{V} (P, V) [x]$
$Sim (x, 1^{λ})$ , here $λ$ is for technical reasons and for large $x$ you can just ignore it.

A caveat about $Sim$ is that we allow it to run in expected-poly-time. Meaning that there may be some very unlucky cases where the algorithm takes a long time, but in expectation it is poly-time.

Also notice that this is for honest verifiers, and we would like the proof to hold for all verifiers. So, our final definition should cover all verifiers.

Definition: An interactive protocol $(P, V)$ is zero-knowledge for a language $L$ if for every PPT $V^{*}$ , there exists a PPT algorithm $Sim$ (a simulator) such that $\forall x \in L$ :

$view_{V} (P, V) [x] \approx Sim (x, 1^{λ})$

Flavors of Zero-Knowledge

Again, consider the real distribution $view_{V} (P, V)$ and simulated distribution $Sim (x, 1^{λ})$ . Based on these, there are flavors of ZK:

Computationally Indistinguishable Distributions (CZK), where the two distributions are computationally indistinguishable.
Perfectly Identical Distributions (PZK), where the two distributions are the same.
Statistically Close Distributions (SZK), where the two distributions are statistically close.

A Simulation for Quadratic-Residue (QR) Proof

Consider the QR proof from above, where we were able to prove with zero-knowledge that we know there exists some $x$ such that $y \equiv x^{2} (mod N)$ . Check that interactive proof again, what is the $view$ here? We see that the verifier learns about $s$ , it also generates some coin toss value $c$ and finally it learns $z$ .

So, $view_{V} (P, V) : (s, b, z)$ . A simulator should output the same view on its own! Can we do it? Yes, in fact, we can do it perfectly!

Simulator $S$ works as follows:

First, pick a random bit $b \in {0, 1}$ .
Then, pick random $z \in Z_{N}^{*}$ .
Compute $s = z^{2} / y^{c}$ .
Output $(s, b, z)$ .

This is identical to what the honest verifier had viewed in the actual interactive proof. Notice that the simulator did not have to know about $x$ , but thanks to that division on 3rd step, it could produce a valid $s$ for the view.

What about an adversarial verifier, for example, one that does not pick $b$ at random? Well, we can still construct a simulator:

First, pick a random bit $b \in {0, 1}$ .
Then, pick random $z \in Z_{N}^{*}$ .
Compute $s = z^{2} / y^{c}$ .
If $V^{*} (N, y, s) = b$ then output $(s, b, z)$ , otherwise go to step 1 and repeat!

With this setting, even if we don't know what $V^{*}$ is doing behind the scenes, we should expect that our randomly picked $b$ should match their fixed and adversarially computed $b$ . In fact, in expectation this takes 2 attempts, since it is just a coin flip!

ZK Proof of Knowledge (ZKPOK)

Notice that prover has prove not only that the claim is correct, but also that they know a square root modulo $N$ . This is more than just a proof of the claim! So, consider the language $L_{R} = {x : \exists w s.t. R (x, w) = accept}$ for poly-time relation $R$ .

Definition: $(P, V)$ is a proof of knowledge (POK) for $L_{R}$ if $\exists$ PPT knowledge extractor algorithm $E$ such that $\forall x \in L$ , in expected poly-time $E^{P} (X)$ outputs $w$ such that $V (x, w) = accept$ .

Here, $E^{P} (x)$ means that $E$ may run $P$ repeatedly on the same randomness, possibly asking different questions in every execution. This is called the Rewinding Technique.

A ZKPOK for Quadratic-Residue (QR) Proof

We have seen the ZKP for a prover that knows $x$ such that $y \equiv x^{2} (mod N)$ . Let's construct the Extractor for this proof.

sequenceDiagram
	actor P as Prover
	actor E as Extractor

	note over P, E: claim: ∃x s.t. y ≡ x^2 (mod N)
	note over P: choose r where 1 ≤ r ≤ N and gcd(r,N)=1
	note over P: s := r^2 mod N
	P ->> E: s
	E ->> P: 1 (heads)
  P ->> E: z = r
	note over P, E: rewind
	P ->> E: s
	E ->> P: 0 (tails)
	P ->> E: z = rx
	note over E: output rx/r = x mod N

Notice the rewind there, hence the Rewinding Technique. What this trick does is that it allows us to access $s$ again! In the interactive proof, we would normally have a new random $s$ every time the prover sent us something, but here we can have access to the same $s$ again and again. As such, we were able to obtain $r$ and $r x$ , and thus find $x$ via division.

Example: Graph Isomorphism

Consider the graph isomorphism problem from the examples at the beginning. The proof is similar to how it works for quadratic residues. So, the claim is that there is an isomorphism $σ$ between $G_{0}$ and $G_{1}$ .

The prover produces a random graph $H$ and sends it to verifier.
The prover finds an isomorphism $γ_{0}$ from $G_{0}$ to $H$ .
The prover also finds an isomorphism $γ_{1}$ from $G_{1}$ to $H$ .
The verifier sends a coin toss (bit) $b$ to prover, and prover returns $γ_{b}$ .
Verifier checks if $H = γ_{b} (G_{b})$ .
These steps happen repetitively, until some poly-time later if the verifier did not reject so far, it accepts.

Notice that since $H = γ_{0} (G_{0}) = γ_{1} (G_{1})$ then $G_{1} = γ_{1}^{- 1} (γ_{0} (G_{0}))$ . So, the prover indeed knows $σ = γ_{1}^{- 1} γ_{0}$ . However, since both isomorphisms are never given to the verifier at the same time, the verifier can't find that isomorphism!

This ZKP is actually of flavor PZK, so the two distributions of real view and simulator view are identical! Furthermore, there is a ZKPOK that prover knowns an isomorphism from $G_{0}$ to $G_{1}$ . Formal proofs left to the reader.

Graph 3-Colorability

Do all NP Languages have ZK Interactive Proofs? Proven in [Goldwasser-Micali-Widgerson'86], the answer is YES! They show that if one-way functions exist, then every $L$ in NP has computational zero-knowledge interactive proofs. The proof will come from an NP-complete problem known as Graph 3-Colorability, hence the title of this section.

The ideas of the proof are as follows:

Show that an NP-complete problem has a ZK interactive proof. [GMW87] showed a ZK interactive proof for $G3-COLOR$ (graph 3-colorability problem). For any other $L$ in NP, $L <_{p} G3-COLOR$ due to NPC reducibility. Every instance $x$ can be reduced to a graph $G_{x}$ such that if $x \in L$ then $G_{x}$ is 3-colorable, and if $x \neq \in L$ then $G_{x}$ is not 3-colorable.
Existence of one way functions imply a hiding & binding bit commitment protocol.

What is a Commitment Scheme?

In a commitment scheme, there are two functions:

$Commit (m)$ takes an input $m$ , let us think of it as a bit in ${0, 1}$ , and produces a commitment $c$ . Once committed, the commitment does not reveal whether $m$ is 0 or 1.
$Decommit (c)$ allows a receiver of a commitment $c$ to open it, and reveal what is in there. In this case, some bit $m$ that is either 0 or 1.

For a commitment scheme,

Binding means that if you have committed to some $m$ , then the decommit procedure on that commitment may only reveal $m$ , not something else.
Hiding means that from outside, you can't guess the bit $m$ with more than 1/2 probability.

An example commitment scheme would be to use a semantically secure probabilistic encryption scheme.

$Commit (b)$ where sender chooses some random $r$ and sends $c := E n c (b; r)$ .
$Decommit (c)$ where sender send the same random $r$ and $b$ , and receiver rejects unless $c = E n c (b; r)$ .

Proving a graph $G$ is $G3-COLORABLE$

Suppose there is a graph $G = (V, E)$ as in the set of vertices and set of edges, and the prover knows some coloring $π : V \to {0, 1, 2}$ which is a mapping that maps a vertex to a color.

Prover picks a random permutation of colors $σ : {0, 1, 2} \to ϕ$ and the colors the graph with this permutation. It still is valid because its just different colors for the same graph. We show this as $ϕ (v) := σ (π (v))$ for $v \in V$ . Then, the prover commits to each newly colored vertex $v$ by running $Commit (ϕ (v))$ protocol.
Verifier selects a random edge $e = (a, b)$ and sends it to prover.
Prover runs $Decommit$ on the colors of edge points $a, b$ and reveals $ϕ (a)$ and $ϕ (b)$ .
Verifier rejects if $ϕ (a) = ϕ (b)$ , otherwise it repeats steps 1-3 and accepts after $k$ iterations.

Now, let's look at the properties of this interactive proof:

Completeness: if $G$ is 3-colorable, then the honest prover uses a proper 3-coloring & the verifier always accepts.
Soundness: if $G$ is not 3-colorable, then for all $P^{*}$ it holds for $k = ∣ E ∣^{2}$ (meaning that we have iterations as many as the square of number of edges) that,

$Pr [V accepts] < (1 - \frac{1}{∣ E ∣})^{k} < \frac{1}{e ^{∣} E ∣}$

Zero-Knowledge: Easy to see informally, messy to prove formally.

Honest-Verifier Computational ZK Simulator

First, let us examine the view of this interactive proof.

We have an edge $(a, b)$
We have the commitments to each vertex coloring as $Commit (ϕ (v))$ .
We have the decommit of colors $ϕ (a), ϕ (b)$ .

Let us show the honest-verifier CZK simulator. For the graph $G = (V, E)$ the simulator will choose a random edge $(a, b) \in E$ . Then, it will pick random colors $ϕ (a), ϕ (b) \in {0, 1, 2}$ such that $ϕ (a) \neq = ϕ (b)$ . For all other vertices $v$ , it will set $ϕ (v)$ to some fixed color, and commit to all $ϕ (v)$ .

The output of simulated view is:

We have an edge $(a, b)$
We have the commitments to each vertex coloring $Commit (ϕ (v))$ .
We have the decommit of colors $ϕ (a), ϕ (b)$ .

As we can see, the views are kind of indistinguishable! They are not the same though, as the commitments to vertices other than $a, b$ are illegal colors, the simulation had no idea how to color the graph anyways. However, since the distinguisher can't see what is under the commitment (hiding property) they are computationally indistinguishable.

The simulator for all verifiers is also given in the lecture, but not noted here.

Practical Applications of ZK

So far, we have seen some ZK proofs of claims:

$N$ is the product of two primes $p, q$ .
$x$ is a square $mod$ $n$ .
Two graphs $(G_{0}, G_{1})$ are isomorphic.

There are a lot more proofs in the domain of Interactive Proofs, for example see the following claims:

Any SAT Boolean Formula has satisfying assignment.
Given encrypted inputs $E (x)$ , and some program $Prog$ , the program has output $y$ such that $y = Prog (x)$ .
Given encrypted inputs $E (x)$ , and some encrypted program $E (Prog)$ , the program has output $y$ such that $y = Prog (x)$ .

These are all provable in ZK, and when you think about the last example that is pretty awesome. Let's talk a bit more general:

You can prove properties about some message $m$ , without revealing $m$ itself but only showing $E n c (m)$ or $H a s h (m)$ .
You can prove relationships between messages $m_{1}, m_{2}$ without revealing them, such as $m_{1} = m_{2}$ or $m_{1} \neq = m_{2}$ . In fact, you can show that there is some value $v$ that when used with a poly-time function $f$ , you have $v = f (m_{1}, m_{2})$ .

In general idea: you can use ZK as a tool to enforce honest behavior in protocols without revealing any information. To do that, imagine that a protocol player sends a message $m$ and along with that new message, it sends a ZKP that $m = Protocol (h, r)$ for some history $h$ and randomness $r$ . Furthermore, they will commit to this randomness as $c = Commit (r)$ . This makes honest behavior with ZK possible since $L = {\exists r : m = Protocol (h, r) and c = Commit (r)}$ is in NP.

Some more real-world applications are:

Computation Delegation
ZK and Nuclear Disarmament
ZK and Forensics
ZCash: Bitcoin with Privacy and Anonymity
ZK and Verification Dilemmas in Law

Complexity Theory: Randomized Analogue to NP

	No Randomizations	With Randomizations (Coin toss)
Efficiently Solvable	P (Polynomial time)	BPP (Bounded-error Probabilistic Polynomial time)
Efficiently Verifiable	NP (Non-deterministic Polynomial time)	IP (Interactive Polynomial time)

Is IP greater than NP?

The answer is YES! Let's go over an example. Suppose that you have two graphs $G_{0}, G_{1}$ that are NOT isomorphic. The shortest classical proof of this would be to go over all possible isomorphisms (takes time in the order of factorials!) and show that none of them work.

However, there is an efficient interactive proof!

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim : G_0 and G_1 not isomorphic
	loop i = 1, 2, ..., k
		note over V: flip coin: b = {0, 1}
		note over V: pick random isomorphism γ
		V ->> P: H := γ(G_b)
		alt H isomorphic to G_0
		note over P: b' = 0
	  else
		note over P: b' = 1
		end
		P ->> V: b'
		note over V: reject if b' != b
	end
	note over V: accept

Here is the idea of this proof: if $G_{0}$ and $G_{1}$ are indeed isomorphic, then the prover would have no idea whether $H$ is isomorphic to $G_{0}$ or $G_{1}$ , because it would be isomorphic to both of them! So, he would have at most 1/2 chance in guessing the correct bit. After $k$ iterations, the possibility of not being rejected by the verifier becomes $1/ 2^{k}$ , which is negligible.

Also, how does prover find isomorphisms like that so easily? Well, remember that the prover is all-powerful and computationally unbounded, so they are very well allowed to find such hard stuff.

Here is what is different about this IP though; here, the Verifier is doing more than just tossing coins. Here, it actually picks a random isomorphism, and creates the graph $H$ from that!

Completeness and soundness hold for this proof; however, it is not zero-knowledge! This is because if the verifier is not honest, and it's just someone that wants to find out whether some graph $H$ is isomorphic to $G_{0}$ , they can very well find it by sending $H$ to the honest prover! Therefore, there is knowledge that is learned about $H$ depending on what the prover replies to that.

The solution to this problem is rather straightforward: have the verifier send a ZKP that they know the isomorphism $γ$ , this way the reply from prover does not change the knowledge of verifier.

Arthur-Merlin Games & is AM = IP?

In the Interactive Proof's so far, the Verifier was hiding the result of coin tosses and also was able to do some extra computations, it was a PPT Verifier so it could do anything that the time allows. However, in an Arthur-Merlin game [Babai-Moran'88], the Verifier will only acts as two things:

A public coin tosses
A decision function

So, the prover will see in clear the result of coin tosses!

sequenceDiagram
	actor P as Prover (Merlin)
	actor V as Verifier (Arthur)

	note over P, V: claim "x in L"
	loop i = 1, 2, ..., k
	V ->> P: coins_i
	P ->> V: answer_i
	note over V: reject if bad
	end
	note over V: accept

The question here: is Interactive-Proofs more powerful (i.e. can prove more claims) than Arthur-Merlin Games? Is coin privacy necessary?

The answer turns out to be no, AM = IP actually [Goldwasser-Sipser'86]!

Fiat-Shamir Paradigm

You can remove the interaction in AM protocols via Fiat-Shamir Paradigm [Fiat-Shamir'87]. Let that sink in, you are literally removing the interaction from an interactive proof, more specifically, you are removing the coin tossing machine. How is that possible?

Let $H : {0, 1}^{*} \to {0, 1}^{k}$ be a cryptographic Hash function, meaning that you can use this function as a random oracle that when you feed any string it gives you back random string of length $k$ , essentially an output of $k$ coin tosses.

The idea is the following:

Normally, we had a Prover that sent some answer $a_{i}$ and got back $co in s_{i}$ from the Verifier.
Now, the Prover will send $a_{i}$ but instead of waiting for coins by the Verifier, it will generate its own coins simply via $H (a_{i})$ .

What if the Prover needs coins before sending any answer? In that case, the first message of coins is posted "publicly" for all to see, and then Fiat-Shamir heuristics is applied to the rest.

However, Fiat-Shamir does not mean you can make all interactive proof's into non-interactive proofs! Yet, many specific AM protocols with an efficient prover can benefit from this heuristic.

Efficient Verification

Problem Class	Objective Idea	Classical Proofs	Interactive Proofs
NP	$\exists$ a solution	YES	YES
Co-NP	0 solutions	?	YES
#P	$k$ many solutions	?	YES
PSPACE	$\forall\exists\forall\exists \dots$	?	YES

It was shown by [Fortnow-Karloff-Lund-Nissan'89] and [Shamir'89], that IP can be used to prove many many more problems than what classical proofs are able to! In fact, this brought the question: what if we have something more powerful than IP, for example, what if there is a second prover?

Indeed, as shown in [Benor-Goldwasser-Kilian-Wigderson'88], you can prove a lot more with two provers! In fact, you now have unconditional PZK for NP problems. [Babai-Fortnow-Lund'91] has shown that you can even prove NEXPTIME (non-deterministic exponential time) problems with interactive proofs.

Going even further, [Reichardt-Unger-Vazirani'13] has shown that a classical verifier can verify the computation of two entangled but non-communicating polynomial time quantum algorithms. Finally, a recent work [Ji-Natarajan-Vidick-Wright-Yuen'20] has shown that with two not necessarily efficient quantum provers, and a classical verifier, you can prove all Recursively Enumerable Languages. Kind of meaning like everything; that's one bizarre result!

SNARK

In the previous lecture, we have discussed interactive proofs (IP) in general. Now, we will mostly be talking about non-interactive proofs, in particular SNARKs.

A SNARK stands for a succinct proof that a certain statement is true. Succinct here is meaning that the proof is "short". For example, I have a statement:

I know an $m$ such that $SHA256 (m) = 0$ .

In SNARK, the proof should be short and fast to verify. A trivial proof of the above statement is to simply send $m$ to the verifier. However, that proof is not short; it is as big as $m$ . Verification is also not fast, as the verifier has to hash the entire message to actually see the proof.

A SNARK can have a proof size of few KBs and verification should take at most seconds.

zk-SNARK

In the case of a zk-SNARK, the proof reveals nothing about $m$ . zk-SNARKs have many applications:

Private transactions: Tornado Cash, ZCash, IronFish, Aleo (private dApps).
Compliance: private proofs of solvency & compliance, zero-knowledge taxes
Scalability: Rollup systems with validity proofs
and a lot more commercial interest…

Why is there so much commercial interest? Well, things go back to a paper [Babai-Fortnow-Levin-Szegedy'91] where they show that a single reliable PC can monitor the operation of a herd of supercomputers running unrealiable software.

This single reliable PC is a slow and expensive computer. An example of such a machine from today is actually a L1 Blockchain!

Blockchain Applications

SNARKs and zk-SNARKs can be used in many ways within a blockchain.

Outsourcing Computation: Even without the need of zero-knowledge, an L1-chain can quickly verify that some off-chain service has done some work correctly.
Scaling with Proof-based Rollups (zkRollup): An off-chain service processes batch of transactions, and the L1-chain verifies a succinct proof that transactions were processed correctly.
Bridging Blockchains (zkBridge): A source chain locks up some asset, so that it can be used in another destination chain. In doing so, it proves that indeed the asset is locked.
Private Transactions: A ZKP that a private transaction is valid. Many examples: TornadoCash, ZCash, Ironfish, Aleo.
Compliance: A proof that a private transaction is compliant with banking laws (e.g. Espresso), or a proof that an exchange is solvent without showing the assets (e.g. Raposa).

Non-Blockchain Applications

Blockchain is really spearheading the development in these areas, but there are many non-blockchain applications of SNARKs too. Here is one example: proving that a photo is taken at some time and at some place.

The initial attempts to this was made via C2PA, a standard of content provenance. With C2PA, the camera that captures the picture will sign its metadata with a secret key (that is assumed not to be extractable by a user). The signed metadata will ensure that the location and timestamp is valid.

However, newspapers and press that display a picture often have to apply some post-processing, such as rescaling, cropping and gray-scaling. There is actually a list of allowed operations by Associated Press. Doing any of these will break the signature, as it can be though of as tampering.

Here is the solution by [Kang-Hashimoto-Stoica-Sun'22] to this problem using zk-SNARKs: suppose that the machine that is doing the post-processing has the photo $P$ and some list of allowed operations $Op s$ . Denote the original photo as $P_{or i g}$ and $s$ as the signature. The editing software will attach a proof $π$ of claim: "I know a pair $(P_{or i g}, s)$ " such that:

$s$ is a valid C2PA signature on $P_{or i g}$
$P$ is the result of applying $Op s$ to $P_{or i g}$
Metadata of $P$ is equal to metadata of $P_{or i g}$

Et viola, you can now be sure that the processed photo is original! The proof size is less than 1KB, with verification time less than 10ms in browser. For images around 6000x4000, just a few minutes is enough to generate a proof, which is awesome.

Arithmetic Circuits

Definition: Fix a finite field $F = {0, 1, \dots, p - 1}$ for some prime $p > 2$ . A finite field is just a set of numbers where we can do addition and multiplication in modulo $p$ . An arithmetic circuit is a DAG (directed acyclic graph) $C : F^{n} \to F$ where internal nodes are labeled $+, -, \times$ and inputs are labeled $1, x_{1}, \dots, x_{n}$ . The circuit defines an $n$ -variate polynomial with an evaluation recipe.

Here is an example:

flowchart LR
	1((1)) --> -
	x2((x2)) --> -
	1((1)) --> +
	x1((x1)) --> +
	x2((x2)) --> +
	+ --> x
	- --> x
	x1 --> x
	x --> r(( ))

This circuit defines the operation $x_{1} (x_{1} + x_{2} + 1) (x_{2} - 1)$ .

For convenience, the size of the circuit refers to the number of gates, and is denoted as $∣ C ∣$ . In the example above, $∣ C ∣ = 3$ .

For example:

You can implement a circuit that does $C_{hash} (h, m) = (h - SHA256 (m))$ . This outputs 0 if $m$ is the preimage of $h$ using SHA256, and something other than 0 otherwise. This circuit uses around 20K gates, which is not bad!
You can have a $C_{sig} (p k, m, σ)$ that outputs 0 if $σ$ is a valid ECDSA signature on $m$ with respect to $p k$ .
A theorem states that all polynomial time algorithms can be captured by polynomial sized arithmetic circuits!

Structured vs. Unstructured

There are two types of arithmetic circuits in terms of structure:

Unstructured Circuit is a circuit with arbitrary wires.
Structured Circuit is a circuit with repeating structure within, denoted as $M$ , and for some input $in$ and output $o u t$ the flow of this arithmetic circuit looks like the following:

$in \to M \to M \to \dots \to M \to o u t$

$M$ is often called a virtual machine (VM), and every step of execution in this structure can be thought of as a single step of some processor, or like a clock cycle.

Some SNARK techniques only apply to structured circuits.

NARK: Non-interactive Argument of Knowledge

Suppose you have some public arithmetic circuit $C (x, w) \to F$ where,

$x \in F^{n}$ is a public statement
$w \in F^{m}$ is a secret witness

Denote $S (C)$ as a preprocessing step for this circuit, which will output a pair $(pp, v p)$ as in public parameters prover and verifier respectively.

sequenceDiagram
	actor P as Prover P(pp, x, w)
	actor V as Verifier V(vp, x)
	P ->> V: proof π that C(x,w) = 0
	note over V: accept or reject

Notice that the Verifier does not talk back to Prover, i.e. it does not interact with it! It just reads the generated proof and that's all, making the entire thing non-interactive.

More formally, a NARK is a triple $(S, P, V)$ :

$S (C) \to (pp, v p)$ is the preprocessing setup, generating public parameters for prover and verifier
$P (pp, x, w) \to π$ is the prover function, generating the proof given the public prover parameters, public inputs and the secret inputs (witness).
$V (v p, x, π) \to {0, 1}$ is the verification function, either accepting or rejecting a given proof $π$ , along with the circuit's public verifier parameters and public inputs.

A technical point to be made here is that, all of these algorithms and the adversary are assumed to have an access to a random oracle. This is most likely due to Fiat-Shamir Paradigm we have learned in the previous lecture, but we will get to more details of this later.

Requirements of a NARK

There are 2 requirements of a NARK, and an optional 3rd requirement of zero-knowledgeness:

Completeness: If the prover does indeed knows the argued knowledge, verifier should definitely accept the proof.

$\forall x, w : C (x, w) = 0 ⟹ Pr [V (v p, x, P (pp, x, w)) = accept] = 1$

Soundness: If the verifier accepts a proof, the prover should indeed know the argued knowledge. "Knowing" something is rather interesting to capture formally, but for now let's say there is an extractor algorithm $E$ that can extract a valid $w$ from the prover.

$V (v p, x, P (pp, x, w)) = accept ⟹ P knows w : C (x, w) = 0$

Zero-knowledge (optional): The view of this interaction, consisting of $(C, pp, v p, x, π)$ "reveals nothing new" of $w$ .

Trivial NARK

We can easily think of a trivial NARK, that is not zero-knowledge but has the other two properties. Simply, the proof $π = w$ . Yeah, just send the witness to the verifier! All the Verifier has to do next is check if $C (x, w) = 0$ , since both $x$ and $C$ were public anyways.

SNARK: Succinct NARK

We will introduce some constraints over the proof size and verification time, giving us two types of NARKs:

succint preprocessing NARK
strongly succinct preprocessing NARK

Let us see the first one.

A succinct preprocessing NARK is a triple $(S, P, V)$ :

$S (C) \to (pp, v p)$ is the preprocessing step, generating public parameters for prover and verifier
$P (pp, x, w) \to π$ is the prover function, where $∣ π ∣ = sublinear (∣ w ∣)$ . So, the proof length must be less than linear in the size of witness.
$V (v p, x, π) \to {0, 1}$ is the verification function, where $time (V) = O_{λ} (∣ x ∣, sublinear (∣ C ∣)$ . Note that the verification will have to read the public inputs, so it is allowed to run in $∣ x ∣$ time, but it must run sub-linear in the size of circuit $C$ .

In practice, we are even more greedy than this, so we have a much better and efficient type of NARK.

A strongly succinct preprocessing NARK is a triple $(S, P, V)$ :

$S (C) \to (pp, v p)$ is the preprocessing step, generating public parameters for prover and verifier
$P (pp, x, w) \to π$ is the prover function, where $∣ π ∣ = O (lo g (∣ C ∣))$ . The proof length must be logarithmic to the size of circuit, making the proof very tiny compared to the circuit!
$V (v p, x, π) \to {0, 1}$ is the verification function, where $time (V) = O_{λ} (∣ x ∣, lo g (∣ C ∣))$ . Again, the verification will have to read the public inputs, so it is allowed to run in $∣ x ∣$ time, but it will not have time to read the entire circuit, which is quite magical. This is actually what $v p$ public parameter is for, it is capturing a summary of the circuit for the verifier so that $lo g (∣ C ∣)$ is enough to run the verification.

A zk-SNARK is simply a SNARK proof that reveals nothing about the witness $w$ .

Trivial SNARK?

Let us again come back to the trivial proof, where $π = w$ .

Prover sends $w$ to the verifier.
Verifier checks if $C (x, w) = 0$

Why can't there be a trivial SNARK? Well, there may be several reasons:

If $w$ is long, the proof size will be too large.
If $C (x, w)$ is taking lots of time, the verifier time will be too long.
Naturally, we might want to keep $w$ secret.

Preprocessing Setup

We said that a preprocessing setup $S (C)$ is done for a circuit $C$ . Things are actually a bit more detailed than this, there are 3 types of setups:

Trusted Setup per Circuit: $S (C; r)$ is a randomized algorithm. The random $r$ is calculated per circuit, and must be kept secret from the prover; if a prover can learn $r$ then they can prove false statements!
Trusted Setup & Universal (Updatable): a random $r$ is only chosen once and is independent of the circuit. The setup phase is split in two parts: $S = (S_{ini t}, S_{in d e x})$ .
1. $S_{ini t} (λ; r) \to g p$ is a one-time setup, done in a trusted environment. $r$ must be kept secret!
2. $S_{in d e x} (g p, C) \to (pp, v p)$ is done for each circuit, and nothing here is secret! Furthermore, $S_{in d e x}$ is a deterministic algorithm.
Transparent: $S (C)$ does not use any secret data, meaning that a trusted setup is not required.

SNARKs in Practice

Notice that we had no constraints on the proof generation time. In the recent years, prover time has been reduced to be in linear time with the size of circuit $∣ C ∣$ , and this kind of enabled the SNARK revolution we are seeing these past few years.

We will go into details of 4 categories of SNARKs throughout the lecture, and again all of these have provers that run in linear time $∣ C ∣$ .

	Size of proof $π$	Verifier time	Setup	Post-Quantum?
Groth16	~200 B $O_{λ} (1)$	~1.5 ms $O_{λ} (1)$	Truster per Circuit	No
Plonk / Marlin	~400 B $O_{λ} (1)$	~3 ms $O_{λ} (1)$	Universal trusted setup	No
Bulletproofs	~1.5 KB $O_{λ} (lo g ∥ C ∥)$	~3 sec $O_{λ} (∥ C ∥)$	Transparent	No
STARK	~100 KB $O_{λ} (lo g^{2} ∥ C ∥)$	~10 ms $O_{λ} (lo g^{2} ∥ C ∥)$	Transparent	Yes

The approximations here are made for a circuit that is size $∣ C ∣ \approx 2^{20}$ gates. There are many more SNARKs out there, but these four are the ones we will go into detail of.

Also note that some of these SNARKs have constant sized proofs and constant time verifiers, which is kind of awesome considering that no matter what your circuit $C$ is, the proof size and verifier time will be approximately the same!

Knowledge Soundness

While describing the properties of a NARK, we mentioned soundness:

$V (v p, x, P (pp, x, w)) = accept ⟹ P knows w : C (x, w) = 0$

Well, what does it mean to "know" here? Informally, $P$ knows $w$ if this $w$ can be somehow extracted from the prover $P$ . The way we do that is kind of torturing the $P$ until it spits out $w$ . Let us give the formal definition now.

Formally, an argument system $(S, P, V)$ is (adaptively) knowledge-sound for some circuit $C$ , if for every polynomial time adversary $A = (A_{0}, A_{1})$ such that:

$g p \leftarrow S_{ini t} ()$
$(C, x, s t) \leftarrow A_{0} (g p)$
$(pp, v p) \leftarrow S_{in d e x} (C)$
$π \leftarrow A_{1} (pp, x, s t)$
$Pr [V (v p, x, π) = accept] > β$ for some non-negligible $β$

there is an efficient extractor $E$ that uses $A_{1}$ as a black box (oracle) such that:

$g p \leftarrow S_{ini t} ()$
$(C, x, s t) \leftarrow A_{0} (g p)$
$(pp, v p) \leftarrow S_{in d e x} (C)$
$w \leftarrow E^{A_{1} (pp, x, s t)} (g p, C, x)$
$Pr [C (x, w) = 0] > β - ϵ$ for some negligible $ϵ$ and non-negligible $β$ .

In other words, the probability that a prover can convince the verifier for some witness $w$ must be at most negligibly different than the probability that this witness $w$ is a valid witness for the circuit $C$ . In doing so, this witness $w$ must be extractable by the efficient extractor $E$ .

Building a SNARK

There are various paradigms on building SNARKs, but the general paradigm is made up of two steps:

A functional commitment scheme, which is a cryptographic object
A suitable interactive oracle proof (IOP), which is an information theoretic object

(1) Functional Commitment Scheme

Well, first, what is a commitment scheme? A cryptographic commitment is like a physical-world envelope. For instance, Bob can put a data in an envelope, and when Alice receives this envelope she can be sure that Bob has committed to whatever value is in it. Alice can later reveal that value.

The commitment scheme has two algorithms:

$co mmi t (m, r) \to co m$ for some randomly chosen $r$
$v er i f y (m, co m, r) \to accept or reject$

The scheme must have the following properties:

Binding: cannot produce two valid openings for $co m$
Hiding: $co m$ reveals nothing about the committed data

There is a standard construction using hash functions. Fix a hash function $H : M \times R \to C$ where

$co mmi t (m, r) = H (m, r)$
$v er i f y (m, co m, r) = accept if co m = H (m, r)$

Committing to a function

Choose a family of functions $F = {f : X \to Y}$ . The function $f$ can be an arithmetic circuit, a C program, whatever; but what does it really mean to commit to a function? Well, consider the following interaction:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: f ∈ F
	P ->> V: com_f := commit(f, r)
	V ->> P: x ∈ X
	P ->> V: y ∈ Y, proof π
	note over V: Accept or Reject

Here, the proof $π$ is to show that $f (x) = y$ and $f \in F$ . Also, $co m_{f}$ is a commitment to function $f$ , but we may also use the notation $f$ to indicate a commitment (think of it like $f$ in an envelope).

Formally, a functional commitment scheme for $F$ is defined by the following:

$se t u p (1^{λ}) \to g p$ outputs public parameters $g p$ .
$co mmi t (g p, f, r) \to co m_{f}$ is commitment to $f \in F$ with $r \in R$ .
- this should be a binding scheme
- optionally, it can be hiding, which is good for a zk-SNARK
$e v a l (P, V)$ with a prover $P$ and verifier $V$ , for a given $co m_{f}$ and $x \in X, y \in Y$ :
- $P (g p, f, x, y, r) \to π$ (a short proof!),
- $V (g p, co m_{f}, x, y, π) \to accept or reject$
- Basically, the $e v a l$ system is a SNARK proof wof the relations: $f (x) = y$ and $f \in F$ and $co mmi t (pp, f, r) = co m_{f}$ .

Four Important Functional Commitments

There are 4 very important functional commitment types:

Polynomial Commitments: Committing to a univariate polynomial $f (X) \in F_{p}^{(\leq d)} [X]$ where that fancy notation stands for the set of all univariate polynomials of degree at most $d$ .
Multilinear Commitments: Committing to a multilinear polynomial in $F_{p}^{(\leq 1)} [X_{1}, \dots, X_{k}]$ which is the set of all the multilinear polynomials in at most $k$ variables, each variable with degree at most 1. Here is an example multilinear polynomial: $f (x_{1}, \dots, x_{7}) = x_{1} x_{3} + x_{1} x_{4} x_{5} + x_{7}$ .
Vector Commitments: Committing to a vector $u = (u_{1}, \dots, u_{d}) \in F_{p}^{d}$ which is a vector of $d$ elements. With our commitment, we would like to be able to open any cell at a later time, such that $f_{u} (i) = u_{i}$ . An example vector commitment scheme is Merkle Trees, which you may have heard of!
Inner-Product Commitments: Committing to a vector $u \in F_{p}^{d}$ . Opening an inner product is done as $f_{u} (v) = (u, v)$ . These are also known as Linear Product Arguments.

It turns out that for these 4 functional commitments, you can obtain any of these from any other.

Polynomial Commitment Scheme (PCS)

A PCS is a functional commitment for the family $F = F_{p}^{(\leq d)} [X]$ .

The prover commits to a univariate polynomial $f \in F_{p}^{(\leq d)} [X]$ , later, they can prove that:
- $v = f (u)$ for some public $u, v \in F_{p}$
- $deg (f) \leq d$ . As this is a SNARK, the proof size and verifier time should be $O_{λ} (lo g d)$ .
The verifier has access to $(d, co m_{f}, u, v)$ .

There are some example PCSs with different mechanisms:

Using basic elliptic curves: Bulletproofs (short proof, but verifier time is $O (d)$ )
Using bilinear groups: KZG'10 (trusted setup), Dory'20 (transparent)
Using groups of unknown order: Dark'20
Using hash functions only: based on FRI (long eval proofs)

Trivial Commitment is bad!

What would be a trivial commitment scheme for a polynomial? Well, first realize that a polynomial is just $f = \sum_{i = 0}^{d} a_{i} x^{i}$ . Then, our commitment will be:

$co mmi t (f, r) = H ((a_{0}, a_{1}, \dots, a_{d}), r)$ as in simply hashing the coefficients and some random variable.
$e v a l$ will be done as follows:
- prover will send $π = ((a_{0}, a_{1}, \dots, a_{d}), r)$ to the verifier
- verifier will construct $f$ from the coefficients, and check if $f (u) = v$ and $H ((a_{0}, a_{1}, \dots, a_{d}), r) = co m_{f}$ .

This is problematic because the proof size and verification time are linear in $d$ , but we wanted a lot smaller values, such as $lo g d$ .

Zero Test: A Useful Observation

We will now make a really useful observation, which is an essential part of SNARKs and is really what makes SNARKs possible!

Consider some non-zero polynomial with degree at most $d$ , shown as $f \in F_{p}^{(\leq d)} [X]$ .

for $r \leftarrow F_{p}$ it holds that $Pr [f (r) = 0] \leq d / p$

We know that $f$ has at most $d$ roots. $r$ is chosen at random from a size $p$ , do the probability that $r$ "hits" a root value is easy to see that $d / p$ .

Now suppose that $p \approx 2^{256}$ and $d \leq 2^{40}$ . Then, $d / p$ is negligible! So it is really unlikely that a randomly chosen field element will be the root for $f$ .

With this in mind, if you do get $f (r) = 0$ for $r \leftarrow F_{p}$ then $f$ is identically zero with very high probability. This gives you a simple zero test for a committed polynomial!

[Schwartz-Zippel-DeMillo-Lipton] lemma states that this observation holds for multivariate polynomials too, where $d$ is treated as the total degree of $f$ . The total degree is calculated as the sum of degrees of all variables, for example $f (x, y) = x^{2} + y^{3}$ has degree $5 = 3 + 2$ .

Equality Test: A Following Observation

Following the zero-test observation, we can make another observation that allows to check if two polynomials are equal.

Let $f, g \in F_{p}^{(\leq d)} [X]$ . For $r \leftarrow F_{p}$ , if $f (r) = g (r)$ then $f = g$ with very high probability! This comes from the observation above, where if $f (r) - g (r) = 0$ then $f - g = 0$ with very high probability.

Here is an interactive protocol that makes use of this equality test, where a prover can commit to polynomials and show that they are equal:

sequenceDiagram
	actor P as Prover(f, com_f, g, com_g)
	actor V as Verifier(com_f, com_g)

	note over V: r ← R (random value)
  V ->> P: r
	note over P: y ← f(r)
	note over P: π_f := proof that y = f(r)
  note over P: y' ← g(r)
	note over P: π_g := proof that y' = g(r)
	P ->> V: y, π_f, y', π_g
	note over V: accept if π_f, π_g are valid and y=y'
	note over V: reject otherwise

That's cool and all, but wait, we talked about non-interactiveness the entire lecture; why are we making an interactive protocol right now? Well, $r$ here is the only interaction that a verifier makes to the prover. It is a "public coin", just some coin-toss (or a series of tosses) given by the verifier to the prover for everyone to see.

Thanks to Fiat-Shamir Transform, we can transform interactive protocols of this nature into non-interactive proofs! More specifically, Fiat-Shamir Transform can take a "public-coin interactive protocol" which means all verifier randomness is public, and transform it into a non-interactive protocol.

To be technical, Fiat-Shamir Transform isn't safe to transform ALL interactive proofs of this nature, but it is good enough for our needs right now.

Let $H : M \to R$ be a hash function. For the example above, the prover will generate $r := H (co m_{f}, co m_{g})$ and this will be used as the random challenge. Since the verifier also has access to $co m_{f}, co m_{g}$ they can generate the same $r$ during verification. That is how the interactiveness is removed!

If $H$ is modeled as a random-oracle, and $d / p$ is negligible (as we discussed in zero-test), then this is a SNARK. In practice, SHA256 is used for the hash function.

Is this a zk-SNARK? No, because the verifier learns the result of evaluating the polynomials at point $r$ .

(2) $F$ - Interactive Oracle Proof

The goal of an $F$ -IOP is to boost a commitment to function $f \in F$ to become a SNARK for general circuits. For example, you could have a polynomial function family $F = F_{p}^{(\leq d)} [X]$ and with $F$ -IOP you can turn that into a SNARK for any circuit with size $∣ C ∣ < d$ .

Definition: Let $C (x, w)$ be some arithmetic circuit. Let $x \in F_{p}^{n}$ . An $F$ -IOP is then a proof system that proves $\exists w : C (x, w) = 0$ . In other words, some prover knows a witness $w$ .

Setup: $S (C) \to (pp, v p)$ where $v p = (f_{0}, f_{- 1}, \dots, f_{- s})$ which are oracles of functions in the function family. These oracles can be though of as function commitments, where the verifier can ask to reveal a function result at some given value, equivalent to making an oracle request. Remember that we have seen setup procedures back when we discussed SNARKs!
Interactive Proof: Proving that $C (x, w) = 0$ .

sequenceDiagram
	actor P as Prover P(pp, x, w)
	actor V as Verifier V(vp, x)

	note over P, V: claim: C(x, w) = 0
	loop i = 1, 2, ..., (t-1)
	P ->> V: oracle [f_i ∈ F]
	note over V: r_i ← F_p
	V ->> P: r_i
	end

	P ->> V: oracle [f_t ∈ F]
	note over V: verify^(f_{-s}, ..., f_t)(x, r_1, ..., r_(t-1))

Let's digest what is happening in this interactive proof:

The prover starts by sending an oracle for function $f_{1}$ . In practice, this is a commitment to function $f_{1}$ which we may show as $f_{1}$ .
The verifier samples a uniformly random field element $r_{1}$ , and sends it back to the prover.
Steps 1 and 2 happen for $t - 1$ rounds.
Finally, the prover sends one last oracle $f_{t}$ , an oracle for function $f_{t}$ .
The verifier starts the verification process. This process has access to all oracles given by the prover, as well as all the generated randomness, and public inputs $x$ as well.

The IOP must have the following properties:

Completeness: If there exists a valid witness known by the prover, the verifier should definitely accept the proof.

$\exists w : C (x, w) = 1 ⟹ Pr [V accepts] = 1$

Soundness: The second property is knowledge soundness (unconditional), meaning that a malicious prover can not convince a verifier that they know a witness $w$ such that $C (x, w) = 0$ . The way to prove that is using an extractor: this extractor is given the statement $x$ and functions $f_{1}, f_{2}, \dots, f_{t}$ in clear! Why in clear? Because, the commitments to those functions were SNARKs too and the extractor can extract the functions themselves from there too. The extractor must extract the witness $w$ from this process.
Zero-knowledge (optional): The view of this IOP "reveals nothing new" of $w$ .

Example: Polynomial IOP for claim $X \subseteq W \subseteq F_{p}$

We will go over an example where the public input $X$ is a set, and secret witness $W$ is a set that contains or is equal to $X$ . Furthermore, $W$ is a subset of or equal to the finite field for prime $p$ . Suppose we capture this relation with a circuit $C$ such that the claim becomes:

$C (X, W) = 0 ⟺ X \subseteq W \subseteq F_{p}$

sequenceDiagram
	actor P as Prover P(pp, X, W)
	actor V as Verifier V(vp, X)

	note over P: compute f(Z), g(Z)
	note over V: compute g(Z)
	note over P: q(Z) := f / g
	P ->> V: oracles [f], [q]
	note over V: r ← F_p
	V ->> P: r
	note over V: query w ← f(r)
	note over V: query q' ← q(r)
	note over V: compute x ← g(r)
	note over V: accept if x * q' = w

I will explain how $f (Z)$ and $g (Z)$ are computed here, let's dig in.

Prover computes two polynomials $f$ and $g$ , a polynomial with roots in $W$ and $X$ respectively. It does that by the following:
1. $f (Z) = \prod_{w \in W} (Z - w)$
2. $g (Z) = \prod_{x \in X} (Z - x)$
Verifier computes $g (Z)$ the same way, because $X$ is public.
Prover computes a quotient polynomial $q (Z) = f / g \in F_{p}^{(\leq d)} [X]$ , which is a polynomial that is the result of dividing $f$ by $g$ . This is only a polynomial if $g$ has all the roots that $f$ has, and that implies $X \subseteq W$ . That is a key point to understand in this proof. Let me give an example: $X = {1, 2}$ and $W = {1, 2, 4}$ . Then, 1. $f (Z) = (Z - 1) (Z - 2) (Z - 4)$ 2. $g (Z) = (Z - 1) (Z - 2)$ 3. $q (Z) = f / g = (Z - 4)$ is a valid polynomial in the finite field!
Prover sends oracles of $f$ and $q$ to the verifier. In practice, it uses a polynomial commitment scheme and sends commitments to these functions, $f$ and $q$ .
Verifier samples a uniform random variable $r$ from the finite field. It sends this to the prover, but the prover makes no use of it. There is a reason why it sends $r$ anyways: it is to make $r$ public! If the verifier didn't send $r$ , we would not be able to say that it is publicly known.
Verifier queries the value of $f$ and $q$ at point $r$ , and also computes $g (r)$ . Denote these as $w, q^{'}, x$ respectively. Note that this querying happens via the polynomial commitment scheme in practice, so behind the scenes verifier sends $r$ to the prover, the prover evaluates it and sends back the result along with a proof that it is evaluated correctly and so on, check the previous section for this.
Verifier checks if $x \times q^{'} = w$ . This is only possible if $q (Z) = f / g$ indeed, think of it like checking $f \times q = g$ in this example.

Replacing the oracles with commitments, and oracle queries with commitment interactions and so on is often called "compilation step", where this Poly-IOP is "compiled" into a SNARK by adding in the PCS (poly-commitment scheme) steps.

The IOP Zoo

There are many SNARKs for general circuits.

IOP	Commitment Scheme	Examples
Poly-IOP	Poly-Commit	Sonic, Marlin, Plonk
Multilinear-IOP	Multilinear-Commit	Spartan, Clover, Hyperplonk
Vector-IOP	Vector-Commit (e.g Merkle)	STARK, Breakdown, Orion

You construct the IOP, and use the relevant commitment scheme to do the commitments and queries; et viola, you have a SNARK. However, the examples we have were interactive so far, but a SNARK is non-interactive. To do that final touch, you use the Fiat-Shamir Transform (using hash functions) to make the entire thing non-interactive.

SNARKs in Practice

flowchart LR
	D[DSL] -- compiler --> S[SNARK-friendly format]
	S -- pp, vp --> B[SNARK backend prover]
	X[x, witness] --> B
	B --> proof

In practice, you wouldn't want to write the entire circuit yourself. We use DSLs (domain-specific languages) to do that for us.

Some DSL examples are:

Circom
ZoKrates for Ethereum
Leo for Aleo
Zinc for zkSync
Cairo for StarkNet
Noir
GNARK
…

The DSL compiles the circuits for you, and outputs a SNARK friendly format. There are several formats:

Circuit
R1CS (Rank-1 Constraint System)
EVM Bytecode (yea, that is possible!)
…

Finally, with the public parameters $pp, v p$ , the public input $x$ and witness $w$ a SNARK backend prover generates a proof.

That is the end of this lecture!

Programming ZKPs

Suppose you have an idea for an application & you want to use ZK in it. What do you do?

flowchart TD
	subgraph this lecture
	idea --program--> p[program]
	p --compile--> c[circuit/constraint-system]
	end
	c --setup--> pp[public params]
	pp --prove--> zkp[ZK-proof]
	pp --> verify
	zkp --> verify
	verify --> out[accept/reject]

In this lecture, we will be doing the following:

Have a big picture on ZKP programmability
Example: Use an HDL (hardware-description language), such as Circom
Example: Use a library, such as Arkworks
Example: Use a programming language & compiler, such as ZoKrates
Overview of prominent ZKP toolchains

Recap: ZKP for a predicate $ϕ$

Let us remember what ZKP does. Suppose you have a predicate $ϕ$ , with some public inputs $x$ and private inputs (witness) $w$ . For example, $ϕ =$ I know a $w$ such that $x = SHA256 (w)$ .

Prover has access to $ϕ, x, w$ .
Verifier has access to $ϕ, x$ .
Proof $π$ will prove that $ϕ (x, w)$ holds, without revealing $w$ .

However, the key question here is: what could $ϕ$ be? What are some other examples? In theory, $ϕ$ can be any NP problem statement.

$w$ is factorization of integer $x$
$w$ is the private key that corresponds to some public key $x$
$w$ is the credential for account $x$
$w$ is a valid transaction

However, transferring these statements into the programming side of things are a bit different.

Arithmetic Circuits

In practice, $ϕ$ may be an "arithmetic circuit" over inputs $x$ and $w$ .

Think of Boolean Circuits that you see in electronic classes or circuit design classes, perhaps you have taken one during your undergrad studies. Well, we had AND gates, OR gates, NAND gates and such there, where the operations were happening on 1s and 0s.

In an Arithmetic Circuit, the operations happen on elements that belong to a finite field of order $p$ , shown as $Z_{p}$ . Usually, $p$ is a large prime number (e.g. ~255 bits). Essentially, an Arithmetic Circuit can be represented by polynomials, for example we could have:

$w_{0} \times w_{0} \times w_{0} = x$
$w_{1} \times w_{1} = x$

However, there is a much more nice way of thinking about circuits: treat them as a DAG (Directed Acyclic Graph)! In this DAG:

Nodes (or Vertices) are inputs, gates, and constants.
Edges are wires/connections.

Here is the circuit for the two polynomials above, visualized as a DAG:

flowchart LR
	w0((w_0)) --> x1[x]
	w0 --> x1[x]
	x1 --> x2[x]
	w0 --> x2[x]
	w1((w_1)) --> x3[x]
	w1 --> x3[x]
	x --> =2
	x2 --> =1[=]
	x((x)) --> =1
	x3 --> =2[=]

Rank-1 Constraint System (R1CS)

R1CS is a format for ZKP Arithmetic Circuit (AC). It is a very commonly used format. Here is how it is defined:

$x$ is a set of field elements $x_{1}, x_{2}, \dots, x_{l}$ .
$w$ is a set of field elements $w_{1}, w_{2}, \dots, w_{m - l - 1}$
$ϕ$ is made up of $n$ equations with the form $α \times β = γ$ where $α, β, γ$ are affine combinations of variables mentioned in the above bullet points.

Let's see some examples of $α \times β = γ$ .

$w_{2} \times (w_{3} - w_{2} - 1) = x_{1}$ is okay.
$w_{2} \times w_{2} = w_{2}$ is okay.
$w_{2} \times w_{2} \times w_{2} = x_{1}$ is NOT okay! You can't have two multiplications like that here! So, what can we do? We could capture this operation with the help of an extra variable, let's say $w_{4}$ :
- $w_{2} \times w_{2} = w_{4}$ is okay.
- $w_{2} \times w_{4} = x_{1}$ is okay, and these two together capture the equation above.

Matrix Definition of R1CS

There is another way of looking at R1CS, using matrices. This time, we define as follows:

$x \in Z_{p}^{l}$ is a vector of $l$ field elements.
$w \in Z_{p}^{m - l - 1}$ is a vector of $m - l - 1$ field elements.
$ϕ$ is made up of 3 matrices: $A, B, C \in Z_{p}^{n \times m}$

Now, we define a vector $z = (1∣∣ x ∣∣ w) \in Z_{p}^{m}$ which has $m$ elements. The rule for this system is that the following equation must hold:

$A z \circ B z = C z$

Here, $\circ$ means the element-wise product of these.

┌───────┐┌┐   ┌───────┐┌┐   ┌───────┐┌┐
|.......||| o |.......||| = |.......||| //--> every row here corresponds to
|   A   |└┘   |   B   |└┘   |   C   |└┘ // some rank-1 constraint!
|       |     |       |     |       |
└───────┘     └───────┘     └───────┘

Example: Writing R1CS of an AC

Consider the following AC:

flowchart LR
	w0((w_0)) --> m1[x]
	w1((w_1)) --> m1
	w1 --> m2
	x0((x_0)) --> a1
	m1 --w_2--> a1[+]
	x0 --> m2[x]
	a1 --w_3--> =
	m2 --w_4--> =

Here, we have a public input $x_{0}$ and two secret inputs (witnesses) $w_{0}, w_{1}$ . The first thing we have to do is capture the intermediate outputs, and we do that by assigning them secret input variables; in this case these would be $w_{2}, w_{3}, w_{4}$ . Then, we simply write equations in the form $α \times β = γ$ as discussed before, as one equation per gate!

$w_{0} \times w_{1} = w_{2}$
$w_{3} = w_{2} + x_{0}$ (notice that left side is actually $w_{3} \times 1$ )
$w_{1} \times x_{0} = w_{4}$
$w_{3} = w_{4}$

As simple as that.

Tutorial: Circom - Using a HDL for R1Cs

First thing to note is that Circom is NOT a Programming Language (PL), it is a Hardware Description Language (HDL).

	Programming Language	Hardware Description Language
Objects	Variables, Operations, Program & Functions	Wires, Gates, Circuit & Sub-circuits
Actions	Mutate variables, Call functions	Connect wires, Create sub-circuits

There are some known HDLs for Digital Circuits:

Verilog
SystemVerilog
VHDL
Chisel

Circom is not an HDL for digital circuits, it is an HDL for R1CS. Wires make the R1CS variables and gates make the R1CS constraints. In essence, Circom does 2 things:

Sets variable values
Creates R1CS constraints

Example: $z = x \times y$

Let's go over a basic example:

template Multiply(){
  signal input x; // private, unless explicitly stated public
  signal input y; // private, unless explicitly stated public
  signal output z; // output is always public

  z <-- x * y;
	z === x * y;
  // z <== x * y; would work too
}

// to start execution, a main component is required
// multiple mains can't exist in a code!
component main {public [x]} = Multiply();
//                      ^ explicitly state x to be public!

Let's analyze what is happening here:

A template is just a circuit (or a sub-circuit if imported by some other)
A signal is a wire, can be input, output or just some intermediate variable.
<-- operation sets signal values.
=== creates a constraint, which must be Rank-1. So, one side is linear and other is quadratic. You can't do things like x * x * x because $x^{3}$ is not quadratic.
As a shorthand, <== does both at once in a single line instead of two lines.
You can also have --> and ==> which work in a similar way.

Example: $y = f^{n} (x)$ where $f (x) = x^{2}$

Now a bit more complicated example.

template RepeatedSquaring(n){
  signal input x;  // private, unless explicitly stated public
  signal output y; // output is always public

  // intermediate signals
  signal xs[n+1];

	xs[0] <== x;
	for (var i = 0; i < n; i++) {
	  xs[i+1] <== xs[i] * xs[i];
	}
	y <== xs[n];
}

// provide template value n = 1000
component main {public [x]} = RepeatedSquaring(1000);
//                      ^ explicitly state x to be public!

Circom has very nice capabilities as demonstrated here!

You can have template arguments such as n here, that you hard-code when you are instantiating the component.
You can have arrays of signals.
You can have variables (defined with var). These are different from signals, they are mutable & are evaluated at compile time.
You can have loops, such as the good ol' for loop.
You can also have if - else statements.
You can access index i in an array with arr[i].

Example: Non-zero & Zero

template NonZero(n){
	signal input in;
	signal inverse;

	inverse <-- 1 / n; // not ok with R1CS
	1 === in * inverse; // is ok with R1CS
}

template IsZero() {
	signal input a;
	signal input b;

	component nz = NonZero();

	// check a is non-zero
	nz.in <== a;

	// b must be 0 for this to hold
	0 == a * b;

	// you could have done this much simpler as:
	// 0 === b;
	// but hey, this is an educational example! :)
}

component main {public [a, b]} = IsZero();

Here, NonZero is a sub-circuit used by IsZero. Within NonZero, we are checking if some input is not 0. However, constraints only check for equality, we don't have something like a !=== b. To check if something is non-zero, we can check if it has an inverse!

To do that, we do inverse <-- 1 / n but hey, this isn't R1! Is that a problem? Well, <-- is just an assignment operator, not a constraint! So, we can do such a thing here; in fact, signal assignment without constraints are a lot more capable than constrainted assignments. The constraints itself is in the next line: 1 === in * signal, which is R1.

Also notice that IsZero uses NonZero within, and it does that by instantiating the sub-circuit as nz. You can access the signals in a circuit with . operator, such as nz.in.

Example: Sudoku Solution

This one is a rather large example, with quite a lot of code too. I will just take notes of the circuit, for the code itself please go to https://github.com/rdi-berkeley/zkp-course-lecture3-code/tree/main/circom.

We would like to prove that we know the solution to a sudoku puzzle.

The public input $x$ will be the initial setting of the sudoku board, with 0 for empty cells and some integer $1 \leq i \leq 9$ for non-empty cells.
The private input $w$ (witness) will be our solution, again as an array of numbers.
Our predicate $ϕ$ is that we know the solution to the given Sudoku setting in the public input.

The inputs will be given as 2-dimensional arrays of size $n \times n$ . We should really like to write a generic template circuit that takes the board size $n$ as template argument.

For now, let's do an example for $n = 9$ . What will be our constraints though? Let's list them one by one:

The solution input should be composed of numbers in range $[1, 9]$ .
The solution input should have rows where every numbers occurs only once.
The solution input should have columns where every numbers occurs only once.
The solution input should have $3 \times 3$ groups of cells (as in Sudoku) where every number occurs once in each group.

Here is the circuit, along with it's sub-circuits.

// Assert that two elements are not equal.
// Done via the check if in0 - in1 is non-zero.
template NonEqual() {
	signal input in0;
  signal input in1;

  // do the inverse trick to check for zero
  signal inverse;
	inverse <-- 1 / (in0 - in1);
	1 === (in0 - in1) * inverse;
}

// Assert that all given values are unique
template Distinct(n) {
  signal input in[n];

  // create a non-equal component for each pair
  component neq[n][n];
  for (var i = 0; i < n; i++) {
		for (var j = 0; j < i; j++) {
			neq[i][j] = NonEqual();
			neq[i][j].in0 <== in[i];
			neq[i][j].in1 <== in[j];
		}
	}
}

// Assert that a given number can be represented in n-bits
// Meaning that it is in range [0, 2^n).
template Bits(n) {
	signal input in;

	signal bits[n];
	var bitsum = 0;
  for (var i = 0; i < n; i++) {
		bits[i] <-- (in >> i) & 1;
    bits[i] * (bits[i] - 1) === 0; // ensure bit is binary
    bitsum += bitsum + 2 ** i * bits[i];
	}
  bitsum == in;
}

// Check if a given signal is in range [1, 9]
template OneToNine() {
  signal input in;
  component lowerbound = Bits(4);
  component upperbound = Bits(4);
  lowerbound.in <== i - 1; // will work only if i >= 1
  upperbound.in <== i + 6; // will work only if i <= 9
}

// Main circuit!
template Sudoku(n) {
  signal input solution[n][n];
  signal input puzzle[n][n];

	// first, let's make sure everything is in range [1, 9]
	component inRange[n][n];
  for (var i = 0; i < n; i++) {
		for (var j = 0; j < n; j++) {
			inRange[i][j] = OneToNine();
			inRange[i][j].in <== solution[i][j];
		}
	}

  // then, let's make sure the solution agrees with the puzzle
	// meaning that non-empty cells of the puzzle should equal to
  // the corresponding solution value
  // other values of the puzzle must be 0 (empty cell)
  for (var i = 0; i < n; i++) {
		for (var j = 0; j < n; j++) {
      // this is valid if puzzle[i][j] == 0, OR
			// puzzle[i][j] == solution[i][j]
			puzzle[i][j] * (puzzle[i][j] - solution[i][j]) === 0;
		}
	}

  // ensure all the values in a row are unique
  component distinctRow[n];
	for (var row = 0; row < n; row++) {
		distinctRow[row] = Distinct(9);
		for (var col = 0; col < n; col++) {
			distinctRow[row].in[col] <== solution[row][col];
		}
	}

	// ensure all the values in a column are unique
  component distinctColumn[n];
	for (var col = 0; col < n; col++) {
		distinctColumn[col] = Distinct(9);
		for (var row = 0; row < n; row++) {
			distinctColumn[col].in[row] <== solution[row][col];
		}
	}

  // ensure all the values in each 3x3 square is unique
  // left as an exercise to the reader :)
}

component main{public[puzzle]} = Sudoku(9);

That is very cool, right?

So yea, Circom is great and it has direct control over constraints. However, using a custom language has it's own drawbacks. An alternative is to use an already known high-level language (e.g. Rust, Go) and have a library to help you write circuits in there.

Tutorial: Arkworks - Using a Library

The most important object in a library will be the constraint system. This guy will keep state about R1CS constraints and variables, and we will interact with it while we write our "circuit".

The key operations here will be:

Creating a variable

cs.add_var(p, v) -> id;
// cs: constraint system
// p:  visibility of variable
// v:  assigned value
// id: variable handle

Creating a linear combination of variables

// make an empty linear combination at first
lc := cs.zero();
// now fill it with variables
lc.add(c, id) -> lc';
// id: variable handle from before
// c:  coefficient
// this correspod to the following linear combination:
//   lc' := lc + c * id

Adding a constraint

// suppose you have some linear constraints lc_A, lc_B, lc_C
cs.constraint(lc_A, lc_B, lc_C)
// adds a constraint lc_A * lc_B = lc_C

These are pretty high-level, so let's take a look at a more realistic example.

fn and(cs: ConstraintSystem, a: Var, b: Var) -> Var {
  // do a simple bitwise AND on values
	let result = cs.new_witness_var(|| a.value() & b.value());
  // constraint: a * b = result, works like AND in booleans
  self.cs.enforce_constraint(
		lc!() + a,
		lc!() + b,
		lc!() + result,
	);
	result // return new result as Var
}

This is cool and all, but it has quite a bit of boilerplate, and seems very tedious & error-prone. So, we would really like a language abstraction somehow, making the library a lot more friendly.

Here is an example, where we can write the same code but apply it in a Boolean struct and overload the and operator.

struct Boolean { var: Var };

impl BitAnd for Boolean {
	fn and(self: Boolean, other: Boolean) -> Boolean {
	 // do the same code above...
		Boolean { var: result } // return result wrapped in Boolean
	}
}

// later in code, you can do stuff like this:
let a = Boolean::new_witness(|| true);
let b = Boolean::new_witness(|| false);

There are many different libraries:

libsnark: gadgetlib (C++)
arkworks: r1cs-std + crypto-primitives (Rust)
Snarky (OCaml)
GNARK (Go)
Bellman (Rust)
PLONKish (Rust)

At this point in lecture, we have an Arkworks tutorial. Please see the lecture itself for that, I will be omitting this part.

Tutorial: ZoKrates - Compiling Programs to Circuits

In the Circom example, we wrote the circuit ourselves. In the Arkworks example, we used a nice high-level language but still had to explicitly specify the wiring and constraints. What if we could have a programming language, that takes in a program and compiles it to R1CS with all it's wires and constraints?

Meet ZoKrates, a tool that does what we have just described.

type F = field;

def multiplication(public F x, private F[2] ys) {
	field y0 = y[0];
	field y1 = y[1];
  assert(x = y0 * y1);
}

def repeated_squaring<N>(field x) -> field {
	field[N] mut xs;
	xs[0] = x;
	for u32 i in 0..n {
		xs[i + 1] = xs[i] * xs[i];
	}
	return xs[N];
}

def main (public field x) -> field {
	repeated_squaring::<1000>(x);
}

ZoKrates has quite a bit of capabilities:

Custom types via structs
Variables that contain values during execution/proving
Visibility is annotated (private/public)
assert creates constraints
Integer generics <N>
Arrays & array accesses
Variables, which are mutable unlike signals
Fixed-length loops
If-else statements

I am omitting the example from this page, please see the lecture for the code.

Recap: The ZKP Toolchains

We have seen that there are generally 3 options:

HDL: A language for describing circuit synthesis.
- pros: clear constraint & elegant syntax
- cons: hard to learn & limited abstraction
Library: a library for describing circuit synthesis
- pros: clear constraint & as expressive as it's host language
- cons: need to know that language & few optimizations
PL + Compiler: a language, compiled to a circuit
- pros: easiest to learn & elegant syntax
- cons: limited witness computation

	Is NOT standalone language	Is a standalone language
Circuit	Library (Arkworks)	HDL (Circom)
Program		PL (ZoKrates)

Finally, note that all of these tools essentially output an R1CS, or more specific types of it like Plonk or AIR. So, within that process, all these tools share quite a bit of common techniques. With that in mind, a library to create ZKP languages actually exists: Circ.

You can also check out https://zkp.science/ which has a great coverage of tools, as well as the development of ZK theory.

SNARKS via IPs

A SNARK stands for a succinct proof that a certain statement is true. Succinct here is meaning that the proof is "short". For example, I have a statement:

I know an $m$ such that $SHA256 (m) = 0$ .

A SNARK can have a proof size of few KBs and verification should take at most seconds.

Interactive Proofs: Motivation & Model

First, we will see what Interactive Proofs (IP)s are and how they differ from a SNARK. Then, we will look at building a SNARK using the IPs.

In an interactive proof, there are two parties: a prover $P$ and a verifier $V$ .

$P$ solves a problem (has some claim), and tells the answer (proves the claim) to $V$ .
Then, they start to have a conversation. $P$ 's goal is to convince $V$ that the answer is correct.

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P, V: claim x
	loop i = 1, 2, ...
		P ->> V: answer_i
		V ->> P: question_i
	end
	note over V: accept/reject

There are two requirements to this:

Completeness: an honest $P$ can convince $V$ to accept.
(Statistical) Soundness: If $P$ is dishonest, then $V$ will catch this with very high probability. In other words, it is negligibly probable that a dishonest $P$ can convince $V$ .
- Note that this must hold even if $P$ is computationally unbounded, and is actively trying to fool $V$ .
- If soundness holds only against polynomial-time $P$ , then the protocol is actually called an interactive argument, not an interactive proof.

Soundness vs. Knowledge Soundness

With that, we must make note on the different between "soundness" and "knowledge soundness". In a previous lecture by Prof. Boneh, we have talked about knowledge soundness in particular.

So, now let us think of a circuit satisfiability case. Let $C$ be some public arithmetic circuit

$C (x, w) \to F$

where $x \in F^{n}$ is some public statement and $w \in F^{m}$ is some secret witness. Let us look at the types of "soundness" with this example:

Soundness: $V$ accepts $⟹ \exists w : C (x, w) = 0$
Knowledge soundness: $V$ accepts $⟹ P$ "knows" $\exists w : C (x, w) = 0$

As we can see, knowledge soundness is "stronger" than soundness, the prover MUST know the existing witness.

However, soundness itself can be valid in some cases even when knowledge soundness has no meaning. This is usually in cases where there is no "natural" witness. For example, $P$ can claim that the output of a program given by $V$ on the public input $x$ is 42. Well, witness is not used in the program, so there is really nothing to "know" here.

The vice-versa is true too, where knowledge soundness means something but you don't really care about soundness. This is usually in cases where the soundness is trivial. For example, $P$ knows the secret key to some Bitcoin account. Well, there does exist a private key to that account for sure. In a "sound" protocol, the verifier could just say "yep, can't argue with that" and accept, without breaking soundness itself.

SNARK's that don't have knowledge soundness are called SNARGs, they are studied too!

Public Verifiability

Interactive proofs and arguments only convince the party that is choosing/sending the random challenges, which is bad if there are many verifiers and prover must interact with each of them separately.

Thankfully, we have something called Fiat-Shamir Transform [Fiat, Shamir'87], where a public-coin protocol (an IP where randomness is public) can be made public & non-interactive! The trick is to use a random oracle model (via a hash function in practice) to generate the required randomness on your own.

So, in summary:

Interactive Proofs	SNARKs
Interactive	Non-Interactive
Information Theoretically Secure (aka Statistically Secure)	Computationally Secure (?)
Not necessarily Knowledge Sound	Knowledge Sound

Trivial SNARK is not a SNARK

What is the trivial way to prove that you know some $w$ such that $C (x, w) = 0$ ? Well, you could just send $w$ right? This has two problems, both against the "succinctness" of a SNARK:

$w$ could be large 🔴
computing $C (x, w)$ could take a lot of time 🔴
actually non-interactive when you think about it 🟢

Slightly trivial SNARKs from Interactive Proofs (IPs)

Let us look at the trivial proof from an interactive proof perspective (making it slightly less trivial). Now, the prover will send $w$ to the verifier, and somehow convince that the sent $w$ satisfies $C (x, w) = 0$ .

w could still be large 🔴
the verification is a lot faster, verifier is not computing $C (x, w)$ directly! 🟢
interactive 🔴

Note that since $w$ is sent to the verifier, the supposedly secret witness is no more secret.

Actual SNARKs!

What actually happens in a SNARK is that, instead of sending $w$ explicitly, the prover will cryptographically commit to $w$ and send that commitment. Again, an IP is used to convince that the committed $w$ satisfies $C (x, w) = 0$ .

In doing so, the prover will reveal just enough about the committed $w$ to allow the verifier to run its checks in the interactive proof.

commitment of $w$ is succinct 🟢
verification is fast 🟢
seems interactive, but can be made non-interactive using Fiat-Shamir transform. The trick there is to use a cryptographic hash function as a source of randomness 🟢

Functional Commitment Schemes

We had talked about some very important functional commitment schemes:

Polynomial Commitments: Committing to a univariate polynomial $f (X) \in F_{p}^{(\leq d)} [X]$ where that fancy notation stands for the set of all univariate polynomials of degree at most $d$ .
Multilinear Commitments: Committing to a multilinear polynomial in $F_{p}^{(\leq 1)} [X_{1}, \dots, X_{k}]$ which is the set of all the multilinear polynomials in at most $k$ variables, each variable with degree at most 1. Here is an example multilinear polynomial: $f (x_{1}, \dots, x_{7}) = x_{1} x_{3} + x_{1} x_{4} x_{5} + x_{7}$ .
Vector Commitments: Committing to a vector $u = (u_{1}, \dots, u_{d}) \in F_{p}^{d}$ which is a vector of $d$ elements. With our commitment, we would like to be able to open any cell at a later time, such that $f_{u} (i) = u_{i}$ . Merkle Tree is an example of vector commitment scheme.

Merkle Tree

Merkle Tree is a very famous vector commitment scheme, and we will look a bit deeper to that. Here is a vector commitment to the vector [m, y, v, e, c, t, o, r].

flowchart BT
	h2 --> h1
	h3 --> h1
	h4 --> h2
	h5 --> h2
  h6 --> h3
  h7 --> h3
  m --> h4
  y --> h4
	v --> h5
  e --> h5
  c --> h6
  t --> h6
  o --> h7
  r --> h7

In this binary tree, every node is made up of the hash of its children:

$h_{1} = H (h_{2}, h_{3})$
$h_{2} = H (h_{4}, h_{5})$
and so on…

The leaf nodes are the elements of the committed vector, m, y, v, and such. The root $h_{1}$ is the commitment to this vector!

When the prover is asked to show that indeed some element of the vector exists at some position, it will provide only the necessary nodes. For example, a verifier could ask "is there really a t at position 6?". The prover will give: c, t, $h_{7}$ and $h_{2}$ . The verifier will do the following:

$h_{6} = H (c, t)$
$h_{3} = H (h_{6}, h_{7})$
$h_{1} = H (h_{2}, h_{3})$

Then, the verifier will compare the calculate $h_{1}$ to the root given as a commitment by the prover. If they match, then t is indeed at that specific position in the committed vector! The way this cryptographically works is due to collision resistance in hash functions, more detailed explanation can be found in the video.

In summary, a Merkle Tree works as follows:

The root is the commitment to the vector.
The reveal a value in the commitment (which is a leaf in the tree) prover does the following:
- Send sibling hashes of all nodes on root-to-leaf path.
- Verifier checks if the hashes are consistent with the root hash.
- The size of this proof to reveal a value is $O (lo g n)$ hash values.
This is a binding scheme: once the root hash is sent, the committer is bound to the committed vector.
- Opening any leaf to two different values requires finding a hash collision, assumed to be intractable.

Example: Committing to a univariate $f (x) \in F_{n}^{(\leq d)} [X]$

Let us think about the [m, y, v, e, c, t, o, r] commitment example above. Suppose that you have a polynomial $f (x) \in F_{7}^{(\leq d)} [X]$ so this polynomial has values defined over a very small $n = 7$ . The degree should be small too, say something like $d = 3$ .

To commit to this polynomial, the prover could simply commit to the following vector:

$[f (0), f (1), f (2), f (3), f (4), f (5), f (6), *]$

Here, $*$ is just some dummy value.

flowchart BT
	h2 --> h1
	h3 --> h1
	h4 --> h2
	h5 --> h2
  h6 --> h3
  h7 --> h3
  f0 --> h4
  f1 --> h4
	f2 --> h5
  f3 --> h5
  f4 --> h6
  f5 --> h6
  f6 --> h7
  * --> h7

Basically, the prover committed to all evaluations of the polynomial. The verifier can ask for some specific evaluation, by asking to reveal some position in the tree (for example $f (3)$ is at the third leaf).

Well, is this really a good method? No, it has quite huge problems actually.

First of all, there are $∣ F_{n} ∣$ nodes in this tree. Evaluating that many elements for large $n$ like $2^{128}$ is a nightmare. We would instead want to have some total evaluation time proportional to the degree bound $d$ .
Speaking of degree, notice that the verifier has no idea if indeed the committed polynomial has degree at most $d$ .

We will see ways to solve these problems within the lecture!

Recall: SZDL Lemma

In our previous lectures, we have touched upon a very important fact: for some univariate polynomial $f \in F_{p}^{(\leq d)} [X]$ what is the probability that $f (r) = 0$ for some $r \in F_{p}$ ? Well, if it is degree $d$ then it has $d$ roots, meaning that there are exactly $d$ points where $f$ evaluates to 0. How many total points are there? The answer is $∣ F_{p} ∣$ . So in short:

$r \in F_{p} Pr [f (r) = 0] \leq \frac{d}{∣ F _{p} ∣}$

For very large $p$ and small $d$ this probability becomes negligible; meaning that you can't really come up with some random field element $r$ and find that $f (r) = 0$ , it is a tiny probability. Following on this "zero-test" fact, you can obtain an "equality-test" with the same reasoning:

$r \in F_{p} Pr [f (r) = q (r)] \leq \frac{d}{∣ F _{p} ∣}$

So if you have two polynomials $f, q$ and they both evaluate to the same thing, chances are they are the same polynomial!

Schwarts-Zippel-Demillo-Lipton Lemma is a multi-variate generalization of this fact. Let $f \neq = q$ be $l$ -variate polynomials of total degree at most $d$ . Total degree refers to the maximum sum of degrees of all variables in any term, for example $x_{1}^{2} x_{2} + x_{1} x_{2}$ has a total degree $3 = 2 + 1$ due to the first term. The lemma states that:

$r \in F_{p}^{l} Pr [f (r) = q (r)] \leq \frac{d}{∣ F _{p} ∣}$

The reason we mention SZDL in particular is that:

interactive proofs tend to make use of multivariate polynomials rather than univariate polynomials.
Instead of having a univariate polynomial with a large degree $d$ , you can have a multivariate polynomial with a smaller degree $d$ which in turn reduces the proof size and makes things much more efficient.

Low-Degree & Multilinear Extensions

We now have some math to do, but do not fear; it will be quite useful!

Definition [Extension]: Given a function $f : {0, 1}^{l} \to F$ , a $l$ -variate polynomial $g$ over $F$ is said to extend $f$ if $\forall x \in {0, 1}^{l} : f (x) = g (x)$ .

Definition [Multilinear Extension]: Any function $f : {0, 1}^{l} \to F$ has a unique multilinear extension (MLE) denoted $\tilde{f} : F^{l} \to F$ .

Multilinear means the polynomial has degree at most 1 in each variable. For example, $(1 - x_{1}) (1 - x_{2})$ is multilinear, but $x_{1}^{2} x_{2}$ is not.

Example: $f : {0, 1}^{2} \to F$

Let us think of some function $f$ defined over the domain ${(0, 0), (0, 1), (1, 0), (1, 1)}$ .

$f (0, 0) = 1$
$f (0, 1) = 2$
$f (1, 0) = 8$
$f (1, 1) = 10$

Here is the multilinear extension for $f$ , shown as $\tilde{f} : F^{2} \to F$ which is:

$\tilde{f} (x_{1}, x_{2}) = (1 - x_{1}) (1 - x_{2}) + 2 (1 - x_{1}) x_{2} + 8 x_{1} (1 - x_{2}) + 10 x_{1} x_{2}$

You can check that for $(x_{1}, x_{2}) \in {0, 1}^{2}$ it holds that $f (x_{1}, x_{2}) = \tilde{f} (x_{1}, x_{2})$ . This multilinear extension is obtained using Lagrange Interpolation, we may get to that later.

Are there other extensions? Well, we have said that multilinear extension is unique, but there are other non-multilinear extensions of $f$ . For example:

$g (x_{1}, x_{2}) = - x_{1}^{2} + x_{1} x_{2} + 8 x_{1} + x_{2} + 1$

also works for the inputs in ${0, 1}^{2}$ , but it is a quadratic extension (total degree is 2).

Relation to Interactive Proofs

The important fact we must realize about multilinear extensions is the following: consider some functions $f, g$ defined over ${0, 1}^{l}$ . Both of these functions have unique MLE's $f, g$ . The cool thing is: if there are any disagreeing inputs such that their evaluations on $f$ and $g$ are not equal, then the MLEs of these functions $f, g$ will disagree on almost all the points within their domain!

You might think of how the hash of an input changes drastically even if the input is changed slightly. This kind of resembles that, if the two functions have different evaluations on the set of points that they are defined on, then the MLE will have many many different evaluations on a lot of points.

The multilinear extensions "blow up & amplify" the tiny differences between $f, g$ , so that you can see the resulting extreme differences in the extensions $f, g$ .

Quick Evaluation of MLEs

Given as input all $2^{l}$ evaluations of a function $f : {0, 1}^{l} \to F$ , for any point $r \in F^{l}$ there is an $O (2^{l})$ -time algorithm for evaluating the MLE $\tilde{f} (r)$ .

The trick is using Lagrange Interpolation. For every input $w = (w_{1}, w_{2}, \dots, w_{l}) \in {0, 1}^{l}$ , define the multilinear Lagrange basis polynomial as follows:

$\tilde{δ}_{w} (r) = i = 1 \prod l (r_{i} w_{i} + (1 - r_{i}) (1 - w_{i}))$

It can be shown that you can get the evaluations of $\tilde{f} (r)$ using these:

$f (r) = w \in {0, 1}^{l} \sum f (w) \times δ_{w} (r)$

For each $w \in {0, 1}^{l}$ the result $\tilde{δ}_{w} (r)$ can be computed with $O (l)$ field operations. As such, the overall algorithm for $2^{l}$ points takes time $O (l 2^{l})$ . Using dynamic programming, this can be reduced to $O (2^{l})$ .

The Sum-Check Protocol

We will now examine a seminal work [Lund-Fortnow-Karloff-Nissan'90], known as the sum-check protocol.

We have a verifier $V$ with oracle access to some $l$ -variate polynomial $g$ over field $F$ . The goal of this verifier is compute the quantity:

$b_{1} \in {0, 1} \sum b_{2} \in {0, 1} \sum \dots b_{l} \in {0, 1} \sum g (b_{1}, b_{2}, \dots, b_{l})$

If you look closely, this sum is actually sum of all evaluations of $b \in {0, 1}^{l}$ in $g$ . In the naive method, the verifier would query each input, and find the sum in a total of $2^{l}$ queries. We will consider this to be a costly operation.

Instead, a prover will compute the sum and convince a verifier that this sum is correct. In doing so, the verifier will make only a single query to the oracle! Let's see how. Denote $P$ as prover and $V$ as verifier.

Start: $P$ sends the claimed answer $C_{1}$ . The protocol must check that indeed:

$C_{1} = b_{1} \in {0, 1} \sum b_{2} \in {0, 1} \sum \dots b_{l} \in {0, 1} \sum g (b_{1}, b_{2}, \dots, b_{l})$

Round 1: $P$ sends univariate polynomial $s_{1} (X_{1})$ claimed to equal $H_{1} (X_{1})$ (H standing for honest):

$H_{1} (X_{1}) = b_{2} \in {0, 1} \sum b_{3} \in {0, 1} \sum \dots b_{l} \in {0, 1} \sum g (X_{1}, b_{2}, \dots, b_{l})$

This sum is basically almost $C_{1}$ , but instead of $b_{1} \in {0, 1}$ we use the variable $X_{1}$ . Since the entire thing is a sum, and $X_{1}$ is the only variable; this whole thing is a univariate polynomial.

Round 1 Check: $V$ now checks that $C_{1} = s_{1} (0) + s_{1} (1)$ , basically filling in the missing sums for $b_{1} \in {0, 1}$ .
- If this check passes, then $V$ can now believe that $C_{1}$ is the correct answer so long as $s_{1} = H_{1}$ . Well, how can we check that $s_{1} = H_{1}$ ?
- Remember that if two polynomials agree at a random point, they are highly likely to be equal! So, $V$ picks a random point $r_{1} \in F$ and checks that $s_{1} (r_{1}) = H_{1} (r_{1})$ .
- Calculating $s_{1} (r_{1})$ is easy for the verifier, it's just some univariate polynomial with a not-so-high degree. However, the verifier does not know $H_{1}$ .
Recursion into Round 2: If you look at the form of $H_{1}$ , it looks a lot like the sum $C_{1}$ . So, you can think of doing the same operations for $C_{2} = H_{1} (r_{1})$ and then do the entire thing to verify the sum $C_{2}$ .

$C_{2} = H_{1} (r_{1}) = b_{2} \in {0, 1} \sum b_{3} \in {0, 1} \sum \dots b_{l} \in {0, 1} \sum g (r_{1}, b_{2}, b_{3}, \dots, b_{l})$

Recursion into Rounds 3, 4, …, $l$ : The verifier and prover keep doing this until the last round.
Final Round (Round $l$ ): Like before, $P$ sends univariate polynomial $s_{l} (X_{l})$ claim to equal $H_{l} (X_{l})$ which is:

$H_{l} = g (r_{1}, r_{2}, \dots, r_{l - 1}, X_{l})$

Final Round Check: $V$ now checks that $s_{l - 1} (r_{l - 1}) = s_{l} (0) + s_{l} (1)$ .
- Again, if this check passes $V$ must make sure that $s_{l} = H_{l}$ . However, we don't have to recurse anymore!
- Notice that $H_{l}$ is just a single query to the $g$ . So, $V$ can pick a random point $r_{l} \in F$ and immediately query the oracle to find $s_{l} (r_{l}) = g (r_{1}, r_{2}, \dots, r_{l})$ .
- No need for anymore rounds, just a single oracle query was enough.

Analysis

Completeness: This holds by design, if prover sends the prescribed univariate polynomial in each round, then all of verifier's checks will pass.
Soundness: If the prover does not send the prescribed messages, then the verifier rejects with probability at least $1 - \frac{l \times d}{∣ F ∣}$ where $d$ is the maximum degree of $g$ in any variable.
- For example, $∣ F ∣ \approx 2^{128}, d = 3, l = 60$ will make this probability $2^{- 120}$ which is tiny.
- You can prove this by induction on the number of variable $l$ , see the video for the proof.
Cost: Total communication is $O (d l)$ field elements.
- $P$ sends $l$ messages with each being a univariate polynomial of degree at most $d$ .
- $V$ sends $l - 1$ messages, each being a random field element.
- $P$ runtime is $O (d \times 2^{l} \times e v a l)$ and $V$ runtime is $O (d l + e v a l)$ , here $e v a l$ is the time required to evaluate $g$ at one point.

Application: Counting Triangles in a Graph

To demonstrate how sum-check protocol can be applied, we will look at an interactive proof about counting triangles in a graph.

Given Input: $A \in {0, 1}^{n \times n}$ , representing the adjacency matrix of a graph.
Desired Output: $\sum_{(i, j, k) \in n \times n \times n} A_{ij} A_{jk} A_{ik}$ which counts the triangles, as if all three points are 1 (meaning that there is an edge) then the term is counted, but if there is only a single 0 there the term is ignored.
Fastest known algorithm runs in matrix-multiplication time, currently about $n^{2.37}$ .

The protocol works as follows:

Interpret the matrix $n \times n$ matrix as if it's a function $A : {0, 1}^{l o g n} \times {0, 1}^{l o g n} \to F$ . The video has a great example showing how this works. Basically, to see what value a cell within the matrix contains, you evaluate the function $A$ with the respective inputs.
Remember SZDL lemma, meaning that there is a unique multilinear extension $\tilde{A}$ for that function $A$ .
Define the polynomial $g (X, Y, Z) = A (X, Y) A (Y, Z) \tilde{A} (X, Z)$ .
Apply the sum-check protocol to $g$ to compute:

$a \in {0, 1}^{l o g n} \sum b \in {0, 1}^{l o g n} \sum c \in {0, 1}^{l o g n} \sum g (a, b, c)$

How much is the cost of this protocol?

Total communication is $O (lo g n)$ .
Verifier runtime is $O (n^{2})$ , which is linear in the size of matrix. This runtime is dominated by evaluating $g (r_{1}, r_{2}, r_{3})$ in the final round of the sum-check protocol.
Prover runtime is $O (n^{3})$ .

SNARK for Circuit-Satisfiability

Let us get to the highlight of this lecture: how to use all this knowledge for circuit satisfiability? Recall that in this problem we have an arithmetic circuit $C$ over $F$ of size $S$ and output $y$ . The prover $P$ claims to know a witness $w$ such that $C (x, w) = y$ . For simplicitly, let's take the public input $x$ to be empty.

Transcript

We will use a notion of transcript, which is defined as an assignment of a value to every gate in the circuit. A transcript $T$ is a correct transcript if it assigns the gate values obtained by evaluating the circuit $C$ on a valid witness $w$ .

Consider the circuit below:

flowchart BT
	a_1((a_1)) --> X1[x]
	a_1 --> X1[x]
	a_2((a_2)) --> X2[x]
	a_2 --> X2[x]
	a_3((a_3)) --> X3[x]
	a_3 --> X3[x]
	a_4((a_4)) --> X4[x]
	a_4 --> X4[x]
	X1 --> A1[+]
	X2 --> A1[+]
	X3 --> A2[+]
	X4 --> A2[+]
	A1 --> A3[+]
	A2 --> A3[+]

A correct transcript for this circuit yielding output 5 would be the following assignment:

flowchart BT
	a_1((1)) --> X1[1]
	a_1 --> X1
	a_2((0)) --> X2[0]
	a_2 --> X2
	a_3((2)) --> X3[4]
	a_3 --> X3
	a_4((0)) --> X4[0]
	a_4 --> X4
	X1 --> A1[1]
	X2 --> A1
	X3 --> A2[4]
	X4 --> A2
	A1 --> A3[5]
	A2 --> A3

Remember the trick of viewing a matrix as a function back in the "counting triangles" example? Well, we can do a similar trick for transcripts too!

A transcript $T$ can be viewed as a function $T : {0, 1}^{l o g S} \to F$ . Assign each gate in $C$ a $lo g S$ -bit label and view $T$ as a function mapping gate labels to $F$ . Basically, by giving the correct gate label to this function you can select a value at the circuit transcript, something like $T (0, 0, 0, 0) = 1$ for the example above.

Polynomial-IOP for SNARK

Recall that our SNARK is all about proving that we know a secret witness $w$ such that for some public input $x$ and arithmetic circuit $C$ it holds that $C (x, w) = 0$ . Denote the circuit size as $S = ∣ C ∣$ .

First, we will construct the correct transcript of $C (x, w)$ , which we denote as $T : {0, 1}^{l o g S} \to F$ . We have talked about how this happens in the previous section.
Prover $P$ will calculate the extension of $T$ to obtain a polynomial $h : F^{l o g S} \to F$ . This extension $h$ is the first message sent to the verifier.

$\forall x \in {0, 1}^{l o g S} : h (x) = T (x)$

The verifier $V$ needs to verify that this is indeed true, but it will only make a few evaluations of $h$ in doing so.

We have talked about why using extensions was a good idea for this kind of proof. Remember that if there is even just a single tiny error in the transcript, the extension of this transcript will disagree on almost all points with respect to the correct transcript.

Alright then, how do we do it?

Step 1: Moving from $h$ to $g_{h}$

First, we will construct a $(3 lo g S)$ -variate polynomial $g_{h}$ such that: $h$ extends a correct transcript $T$ if and only if $\forall (a, b, c) \in {0, 1}^{3 l o g S} : g_{h} (a, b, c) = 0$ . To evaluate $g_{h} (r)$ for any $r \in F$ , it should suffice to evaluate $h$ at only 3 inputs.

As a sketch of the proof, define $g_{h} (a, b, c)$ as the following:

$a dd (a, b, c) (h (a) - (h (b) + h (c)) + m u lt (a, b, c) (h (a) - (h (b) h (c)))$

We have two new functions here, let's just quickly introduce what they are:

$\tilde{a dd} (a, b, c)$ is a multilinear extension of a wiring predicate of a circuit, which returns 1 if and only if $a$ is an addition gate and it's two inputs are gates $b$ and $c$ .
$\tilde{m u lt} (a, b, c)$ is a multilinear extension of a wiring predicate of a circuit, which returns 1 if and only if $a$ is a multiplication gate and it's two inputs are gates $b$ and $c$ .

With this definition, notice what happens:

If $\tilde{a dd} (a, b, c) = 1$ then $g_{h} (a, b, c) = h (a) - (h (b) + h (c))$ . For this to be zero, $h (a) = h (b) + h (c)$ is required.
If $\tilde{m u lt} (a, b, c) = 1$ then $g_{h} (a, b, c) = h (a) - h (b) h (c)$ . For this to be zero, $h (a) = h (b) h (c)$ is required.
Otherwise, $g_{h} (a, b, c) = 0$ .

As such, if the addition and multiplications in the extension $h$ behave correctly with respect to the correct transcription $T$ , then the sum of evaluating the points on $g_{h} (a, b, c)$ should be 0. As a further note, in structured circuits (circuits with repeating structure & wiring) the computation of $a dd$ and $m u lt$ can be made a bit more efficient.

What we accomplish by doing this is the following: the original claim is that $h$ extends a correct transcript $T$ . This is quite a complicated thing to show per se, there may be many things going on with $h$ . $g_{h}$ on the other hand requires a more simpler structure, just check if the result is 0 for all the inputs.

Note that ${0, 1}^{3 l o g S}$ is sometimes referred to as boolean hypercube within the lecture. This is because ${0, 1}^{3}$ is a boolean hypercube (specifically the corners of the hypercube can be labeled as the elements of this set) and we want $g_{h}$ to vanish over $lo g S$ variables using this hypercube.

Step 2: Proving $g_{h}$

So, how can the verifier check that indeed $\forall (a, b, c) \in {0, 1}^{3 l o g S} : g_{h} (a, b, c) = 0$ ? In doing so, verifier should only evaluate $g_{h}$ at a single point!

Using a Quotient Polynomial: Imagine for a moment that $g_{h}$ were a univariate polynomial $g_{h} (X)$ . In that case, this would be defined over some subset $H \subseteq F$ and we would want to check that $\forall x \in H : g_{h} (x) = 0$ .

There is a well-known result in polynomials that will be very useful here: $\forall x \in H : g_{h} (x) = 0$ if and only if it is divisible by the vanishing polynomial for $H$ . The vanishing polynomial is defined as $Z_{H}$ :

$Z_{H} (x) = a \in H \prod (x - a)$

The polynomial IOP will work as follows:

$P$ sends a polynomial $q$ such that $g_{h} (X) = q (X) \times Z_{H} (X)$ .
$V$ verifies this by picking a random $r \in H$ and checking $g_{h} (r) = q (r) \times Z_{H} (r)$ .

This approach is not really the best approach though; it has problems.

First of all, $g_{h}$ is not univariate, it is obviously $(3 lo g S)$ -variate.
Having the prover find and send the quotient polynomial $q$ is expensive.
In the final SNARK, this would mean applying polynomial commitment to additional polynomials, increasing the cost.

Well, we say that there are problems but this approach is actually used by well-known schemes: Marlin, Plonk and Groth16.

Using Sum-Check Protocol: An alternative approach for this step, which we have been foreshadowing throughout this lecture, is the sum-check protocol!

Sum-check handles multi-variate polynomials, and it doesn't require $P$ to send additional large polynomials. The sum-check we are interested in is the following:

$0 = a \in {0, 1}^{l o g S} \sum b \in {0, 1}^{l o g S} \sum b \in {0, 1}^{l o g S} \sum g_{h} (a, b, c)^{2}$

To capture the general idea, we are working with integers $Z$ instead of finite field $F$ here. When we square the result like that, the entire sum is zero if and only if $g_{h} (a, b, c)$ is zero for all inputs.

In the end, the verifier will only compute $g_{h} (r_{1}, r_{2}, r_{3})$ for some random inputs, and it suffices to compute $h (r_{1}), h (r_{2}), h (r_{3})$ for that.
Outside of these evaluations, $V$ runs in time $O (lo g S)$
$P$ performs $O (S)$ field operations given a witness $w$ .

That concludes this rather dense lecture! Don't be discouraged if you didn't understand the entire thing, I don't think any of us can really get it in a single run.

Recall: Polynomial Commitments

We will use polynomial commitments in this lecture, so let's quickly recall what they are!

The prover would like to commit to some polynomial $f \in F_{p}^{(\leq d)} [X]$ .
An $e v a l$ function uses evaluate some values for this polynomial, without revealing it. For example, pick some public $u, v \in F_{p}$ .
- Prover will convince that $f (u) = v$ and $d e g (f) \leq d$ .
- Verifier will only know $d, u, v$ and a polynomial commitment $co m_{f}$ , also shown as $f$ sometimes.
The proof size for $e v a l$ and the verifier time should both be in $O_{λ} (lo g d)$ . Spoilers: it will turn out be constant $O (1)$ which is really a magical thing to me.

KZG Poly-commit Scheme

In this lecture, we will use KZG [Kate-Zaverucha-Goldberg'10] polynomial commitment scheme.

Fix some finite cyclic group $G$ of order $p$ . This group basically has some generator value $G$ and the group consists of it's multiplications:

$G = {0, G, 2 G, 3 G, \dots, (p - 1) G}$

The group has addition operation defined on it, where you can add $a G + b G$ to obtain $c G$ where $a + b \equiv c (mod p)$ .

Trusted Setup: $se t u p (λ) \to g p$

KZG starts with a trusted setup $se t u p (λ) \to g p$ to produce public parameters. This is done as follows:

Sample some random $τ \in F_{p}$ .
Compute $\forall i \in [d] : H_{i} = τ^{i} G$ . Basically, you have $d + 1$ group elements, each multiplied to some power of tau ( $τ$ ) and multiplied by $G$ . These computed values will be the public parameters $g p$ .

$g p = (H_{0} = G, H_{1} = τ G, H_{2} = τ^{2} G, \dots, H_{d} = τ^{d} G) \in G^{d + 1}$

Finally and most importantly, delete $τ$ . If this number is leaked and in wrong hands, they can create fake proofs! This is why the setup must take place in a trusted environment. We will actually see a better way for this setup, where multiple parties will take place in the ceremony and it suffices for only one of them to be trusted!

Commitment: $co mmi t (g p, f) \to co m_{f}$

A commitment will take the public parameters $g p$ along with the polynomial $f$ to be committed, and produces the commitment.

The commitment is shown as $co mmi t (g p, f) \to co m_{f}$
Our commitment will be $co m_{f} = f (τ) G \in G$ .

But wait, $τ$ was deleted after setup so how do we obtain this? Well, think of the polynomial $f$ as follows:

$f (X) = f_{0} + f_{1} X + f_{2} X^{2} + \dots + f_{d} X^{d}$

Notice that for every $X^{i}$ (including $X^{0}$ ) you write $H_{i}$ you get the following:

$f_{0} H_{0} + f_{1} H_{1} + f_{2} H_{2} + \dots + f_{d} H_{d}$

We can very well do this because we know what each $H_{i}$ is, they are given us within the public parameters $g p$ . If you expand $H_{i}$ you notice that:

$f_{0} G + f_{1} τ G + f_{2} τ^{2} G + \dots f_{d} τ^{d} G = f (τ) G$

We got the commitment we've wanted! Note that this commitment is binding, but not hiding as is.

Evaluation: $e v a l$

Let us now see how a verifier evaluates the commitment.

Prover knows $(g p, f, u, v)$ and wants to prove that $f (u) = v$ .
Verifier knows $(g p, co m_{f}, u, v)$ .

We will have some series of if-and-only-if's now, which will connect everything really nicely.

$f (u) = v$ if and only if $u$ is a root of $\hat{f} := f - v$ . This makes sense because if indeed $f (u) = v$ then $f (u) - v = 0$ which would make $u$ a root for $\hat{f}$ .
$u$ is a root of $\hat{f}$ if and only if the polynomial $(X - u)$ divides $\hat{f}$ . You might be familiar with this property already throughout this lecture.
$(X - u)$ divides $\hat{f}$ if and only if $\exists q \in F_{p} [X]$ such that $q (X) (X - u) = \hat{f} (X) = f (X) - v$ . This is another way of saying that since $(X - u)$ divides $\hat{f}$ there will be no remainder left from this division, and there will be some resulting quotient polynomial $q$ .

With this knowledge in mind, here is the plan:

Prover computes $q (X)$ and commits to $q$ as $co m_{q}$ . Remember that commitment results in a single group element only.
Prover send the proof $π = co m_{q}$ . That's right, the entire proof is just the commitment to $q$ which means the proof size is a single group element, independent of the degree $d$ .
Verifier accepts if $(τ - u) co m_{q} = co m_{f} - v G$

You may notice that there is $τ$ here, which is supposed to be secret; and you are right. What actually happens is that something called pairing is used here to hide $τ$ while still allowing the above computation. In doing so, only $H_{0}$ and $H_{1}$ will be used, which again makes this thing independent of degree $d$ .

So are we really independent of $d$ ? Well, the prover must compute the quotient polynomial $q$ and the complexity of that is related to $d$ , so you will lose from prover time when you have large degrees.

You might ask, how to prove that this is a secure poly-commit scheme? We are not going into that today…

Properties of KZG

KZG has some cool properties!

Generalizations: It has been shown that you can use KZG to commit to $k$ -variate polynomials [Papamanthou-Shi-Tamassia'13]
Batch Proofs: Suppose you have commitments to $n$ polynomials $f_{1}, f_{2}, \dots, f_{n}$ and you have $m$ values to reveal in each of them, meaning that you basically want to prove all evaluations defined by $f_{i} (u_{i, j}) = v_{i, j}$ for $i \in [n]$ and $j \in [m]$ .
- Normally, this would require $n \times m$ evaluations, but thanks to KZG we can actually do this in a single proof that is a single group element!
Linear-time Commitments: How long does it take to commit to a polynomial of degree $d$ ? Well, we would really like this to be in linear time with $d$ , and turns out it is possible to do so. This deserves a sub-section on its own though, so let us do that.

Linear-time Commitments

The way we calculate the commitment $co m_{f} = f (τ) G$ will change based on how we represent a polynomial $f \in F_{p}^{(\leq d)} [X]$ . There are several ways.

Coefficient Representation: Simply, keep a record of $d + 1$ coefficients to construct the polynomial.
- $f (X) = f_{0} + f_{1} X + \dots f_{d} X^{d}$ would mean that we are storing an array of values $[f_{0}, f_{1}, \dots, f_{d}]$ .
- We can compute the commitment in linear time $O (d)$ since we just have to multiply $f_{i}$ with $H_{i}$ for $i \in [d]$ , giving us: $co m_{f} = f_{0} H_{0} + f_{1} H_{1} + \dots + f_{d} H_{d}$
Point-Value Representation with NTT: A polynomial of degree $d$ can be defined by $d + 1$ points. So, we have $d + 1$ points and their evaluations $(a_{0}, f (a_{0})), (a_{1}, f (a_{1})), \dots, (a_{d}, f (a_{d}))$ .
- Computing $co m_{f}$ naively would be to construct the coefficients $f_{0}, f_{1}, \dots, f_{d}$ to basically convert point-value representation to coefficient representation, and then compute the commitment as shown in that case.
- Converting from point-value to coefficient representation takes time $O (d lo g d)$ using Number Theoretic Transform (NTT) which is closely related to Fourier Transform. However, this is more than linear time, we want to do better!
Point-Value Representation with Lagrange Interpolation: Thankfully, there is a linear-time algorithm to commit to a polynomial in point-value representation. The idea is to use Lagrange Interpolation to compute the commitment.

$f (τ) = i = 0 \sum d λ_{i} (τ) f (a_{i})$

$λ_{i} (τ) = \frac{\prod _{j = 0, j \neq = i}^{d} ( τ - a _{j} )}{\prod _{j = 0, j \neq = i}^{d} ( a _{i} - a _{j} )} \in F_{p}$

The idea here is that our public parameters will actually be in Lagrange form, and the process of getting this just a linear transformation that everyone can do. So, we obtain public parameters $\overset{g p}{^}$ that looks like:

$\overset{g p}{^} = (\hat{H}_{0} = λ_{0} (τ) G, \hat{H}_{1} = λ_{1} (τ) G, \dots, \hat{H}_{d} = λ_{d} (τ) G) \in G^{d + 1}$

This way, the commitment can be compute in linear time $O (d)$ :

$co m_{f} = f (τ) G = f (a_{0}) \hat{H}_{0} + f (a_{1}) \hat{H}_{1} + \dots + f (a_{d}) \hat{H}_{d}$

Fast Multi-point Proof Generation

Let $Ω \subseteq F_{p}$ and $∣Ω∣ = d$ . Suppose that the prover needs an evaluation proof $π_{a}$ for all $a \in Ω$ . Normally, this would require $O (d^{2})$ time because proving one takes time linear in $d$ and there are $d$ values.

Thanks to [Feist-Khovratovic'20] there is a much faster algorithm to do this.

If $Ω$ is a multiplicative group then it takes time in $O (d lo g d)$
otherwise, it takes time in $O (d lo g^{2} d)$

Dory Poly-commit Scheme

KZG has some difficulties:

it requires a trusted setup to compute the public parameters $g p$
$g p$ size is linear in the degree $d$

Can we do any better? Kind of yeah! Dory [Lee'20] is a polynomial commitment scheme with the following properties:

🟢 It has transparent setup, so there is no need for a trusted setup
🟢 $co m_{f}$ is still just a single group element, independent of degree
🔴 $e v a l$ proof size is $O (lo g d)$ group elements; KZG took constant time.
🔴 $e v a l$ verification time is $O (lo g d)$ ; KZG took constant time.
🟢 prover time is $O (d)$

PCS to Commit to a Vector

Poly-commit schemes are a drop-in replacement for Merkle Trees, which we have used before to make vector commitments.

Suppose you have some vector $(u_{1}, u_{2}, \dots, u_{k}) \in F_{p}^{(\leq d)}$ . To commit to this vector, the prover will interpolate a polynomial $f$ such that $f (i) = u_{i}$ for $i \in [k]$ . Then, the prover can simply commit to this polynomial $f$ as we have described above.

If a verifier wants to query some vector elements, like "show me that $u_{2} = a$ and $u_{4} = b$ " this translate to "show me $f (2) = a$ and $f (4) = b$ " and we know we can prove this in a single group element using a batch proof thanks to KZG.

If we were to use a Merkle Tree, each evaluation proof would have size $O (lo g k)$ and for $ℓ$ proofs this would mean $O (ℓ lo g k)$ proof size, a lot bigger than the constant proof size of KZG.

For more applications of using PCS in Merkle Trees, see Verkle Trees!

Proving Properties of Committed Polynomials

Before we start, we would like to make note of a shorthand notation: when we say the verifier queries a polynomial $f$ at some random point $r$ to get $f (r)$ we actually mean that the prover computes $y = f (r)$ and a proof of this evaluation $π_{f}$ , then it sends back $y, π_{f}$ to the verifier.

Also note that everything we will do in our interactive proofs will be public-coin protocols, so although what we will do looks interactive just keep in mind that we can use Fiat-Shamir transform to make them non-interactive.

Equality Testing

Recall that in KZG, the verifier could test if $f = g$ just by knowing $co m_{f}, co m_{g}$ , also shown as $f, g$ . For bit more complex equality tests, that won't be enough.

For example, suppose that the verifier has $f, g_{1}, g_{2}, g_{3}$ and would like to see if $f = g_{1} g_{2} g_{3}$ . To do this, the verifier has to query the prover on all four polynomials at some random field element and test equality.

Important Proof Gadgets for Uni-variates

Let $Ω \subseteq F_{p}$ where $∣Ω∣ = k$ . Let $f \in F_{p}^{(\leq d)} [X]$ be a polynomial of degree $d$ and $d \geq k$ . The verifier has a commitment to this polynomial, $f$ .

We will now construct efficient poly-IOPs for the following proof gadgets:

Equality Test: prove that $f, g$ are equal. We know that evaluating them at a random point and seeing if they are equal does the trick, assuming degree is much smaller than the size of the finite field.
Zero Test: prove that $f$ is identically zero on $Ω$ , meaning that it acts like a zero-polynomial for every value in $Ω$ , but of course it can do whatever it wants for values outside of $Ω$ but in $F_{p}$ .
Sum Check: prove that $\sum_{a \in Ω} f (a) = 0$
Product Check: prove that $\prod_{a \in Ω} f (a) = 1$
Permutation Check: prove that evaluations of $f$ over $Ω$ is a permutation of evaluations of $g$ over $Ω$
Prescribed Permutation Check: prove that evaluations of $f$ over $Ω$ is a permutation of evaluations of $g$ over $Ω$ , with a "prescribed" permutation $W : Ω \to Ω$ . This permutation is a bijection $\forall i \in [k] : W (ω^{i}) = ω^{j}$

To start, we need to introduce the concept of a vanishing polynomial.

Definition: The vanishing polynomial of $Ω$ (as defined above) is:

$Z_{Ω} (X) := a \in Ω \prod (X - a)$

with degree $k$ . Then, let $ω \in F_{p}$ be a primitive $k$ -th root of unity, meaning that $ω^{k} = 1$ . If the set $Ω$ is defined as follows:

$Ω = {1, ω, ω^{2}, \dots, ω^{k - 1}} \subseteq F_{p}$

then $Z_{Ω} (X) = X^{k} - 1$ . This is really nice, because for such cases, evaluating $Z_{Ω} (r)$ for some random field element $r$ means just taking $r^{k}$ and subtracting one, which costs around $lo g k$ field operations, thanks to repeated-squaring method of multiplication.

Zero Test

In the following graph, we denote Z(r) for $Z_{Ω} (r)$ . Also remember that when we say "the verifier queries some polynomial and the prover shows its evaluation", what we mean is that in the background the prover computes them and sends the result along with an evaluation proof.

With that said, let us see the zero-test poly-IOP.

sequenceDiagram
	actor P as Prover(f)
	actor V as Verifier(com_f)

	note over P: q(X) = f(X) / Z(X)
	P ->> V: com_q
	note over V: r ← F_p
	V ->> P: query f(X) and q(X) at r
	P ->> V: show f(R) and q(R)

	note over V: accept if f(R) = q(R) * Z(R)

Let's analyze the costs in this IOP:

The verifier made two polynomial queries (although a batch proof could have been done), and also it computed $Z_{Ω} (r)$ on it's own which takes time $O (lo g k)$ .
The prover time is dominated by the time to compute $q (X)$ and then commit to it, which runs in time $O (k lo g k)$ .

Product Check and Sum Check

Prod-check and sum-check are almost identical, so we will only look at prod-check. Again, our claim is that $\prod_{a \in Ω} f (a) = 1$ and we would like to prove that.

Set $t \in F_{p}^{(\leq k)} [X]$ to be a polynomial of degree $k$ . Define the evaluations of this polynomial as follows:

$t (1) = f (1)$
$t (ω) = f (1) \times f (ω)$
$t (ω^{2}) = f (1) \times f (ω) \times f (ω^{2})$
and so on, with the final evaluation of $t (ω^{k - 1})$ being equal to the product itself!
$t (ω^{k - 1}) = \prod_{a \in Ω} f (a) = 1$

You can see that we can define $t (ω^{s}) = \prod_{i = 0}^{s} f (ω^{i})$ for $s \in [k - 1]$ . It is also important to notice the recurrence relation between $t$ and $f$ :

$\forall x \in Ω : t (ω x) = t (x) f (ω x)$

which is made possible because $Ω$ consists of powers of $ω$ . The lemma we will use with these is the following:

if $t (ω^{k - 1}) = 1$
and $t (ω x) - t (x) f (ω x) = 0$ for all $x \in Ω$
then, $\prod_{a \in Ω} f (a) = 1$

Let's write the interactive proof! The idea will to construct another polynomial $t_{1} (X)$ which is:

$t_{1} (X) = t (ω X) - t (x) f (ω X)$

which implies that if a zero-test on $t_{1} (X)$ for $Ω$ passes, then prod-check passes.

sequenceDiagram
	actor P as Prover(f)
	actor V as Verifier(com_f)

	note over P: construct t(X)
	note over P: construct t1(X) = t(ωX) - t(X) * f(ωX)

	note over P: set q(X) = t1(X)/(X^k - 1)
	P ->> V: com_q, com_t
	note over V: r ← F_p
	V ->> P: query t(X) at ω^(k-1), r, ωr
	P ->> V: show t(ω^(k-1)), t(r), t(ωr)
	V ->> P: query q(X) at r
	P ->> V: show q(r)
	V ->> P: query f(X) at ωr
	P ->> V: show f(r)

	note over V: accept if t(ω^(k-1)) = 1
	note over V: and if t(ωr) - t(r)f(ωr) = q(r)(r^k - 1)

The cost of this protocol is as follows:

Proof size is two commitments ( $q, t$ ) and five evaluations, and keeping in mind that evaluations can be batched, the entire proof size is just 3 group elements.
Prover time is dominated by computing $q (X)$ that runs in time $O (k lo g k)$
Verifier time is dominated by computing $(r^{k} - 1)$ and $ω^{k - 1}$ , both in time $O (lo g k)$

Note that almost the same protocol works for rational functions. There, our claim is $\prod_{a \in Ω} (f / g) (a) = 1$ and we construct a similar $t$ polynomial, only this time $f (x)$ is divided by $g (x)$ in the definition. Then, the lemma is also similar:

if $t (ω^{k - 1}) = 1$
and $t (ω x) g (w x) - t (x) f (ω x) = 0$ for all $x \in Ω$
then, $\prod_{a \in Ω} f (a) / g (a) = 1$

Almost the same!

Permutation Check

We have two polynomials $f, g \in F_{p}^{(\leq d)} [X]$ and we want to show that

$(f (1), f (ω), f (ω^{2}), \dots, f (ω^{k - 1})) \in F_{p}^{k}$ is just a permutation of
$(g (1), g (ω), g (ω^{2}), \dots, g (ω^{k - 1})) \in F_{p}^{k}$
essentially proving that $g (Ω)$ is same as $f (Ω)$ , but just permuted.

To prove this, we will do what is known as the Lipton's trick [Lipton'89]. We will construct two auxiliary polynomials:

$\hat{f} (X) = \prod_{a \in Ω} (X - f (a))$
$\overset{g}{^} (X) = \prod_{a \in Ω} (X - g (a))$

Now notice that $\hat{f} = \overset{g}{^}$ if and only if $f$ is a permutation of $g$ . This is because the product is a series of multiplications, which is a commutative operation.

Normally, to prove that $\hat{f} = \overset{g}{^}$ the prover would only have to show the evaluation of them at a random point, $r \in F_{p}$ given by the verifier. However, computing these polynomials are a bit expensive, so instead the prover will do a clever trick: do a prod-check on the following rational function:

$\frac{f ^ ( r )}{g ^ ( r )} = a \in Ω \prod (\frac{r - f ( a )}{r - g ( a )}) = 1$

We have just mentioned that prod-check can be done on rational functions, so we can very well do this! The cost of this proof is just two commitments, and six evaluations.

Prescribed Permutation Check

Again we have two polynomials $f, g \in F_{p}^{(\leq d)} [X]$ and a permutation $W : Ω \to Ω$ . The verifier has commitments to these $f, g, W$ . Our claim is that $f (y) = g (W (y))$ for all $y \in Ω$ , in other words, $g$ is a permutation of $f$ over $Ω$ as described by $W$ .

At a first glance, it is tempting to do a simple zero-test on $f (y) - g (W (y)) = 0$ , right? Nope, notice that $g (W (y))$ results in a polynomial of degree $∣Ω ∣^{2}$ , but we wanted to have a linear time prover; this results in a quadratic time prover!

Instead, we have a clever method that will run in linear time. We start with the following observation: if the set of pairs $(W (a), f (a))_{a \in Ω}$ is a permutation of $(a, g (a))_{a \in Ω}$ then $f (y) = g (W (y))$ for all $y \in Ω$ .

Here is a quick example of this:

Permutation: $W (ω^{0}) = ω^{2}, W (ω^{1}) = ω^{0}, W (ω^{2}) = ω^{1}$
First set of pairs: $(ω^{0}, g (ω^{0})), (ω^{1}, g (ω^{1})), (ω^{2}, g (ω^{2}))$
Second set of pairs: $(ω^{0}, f (ω^{0})), (ω^{2}, f (ω^{1})), (ω^{1}, f (ω^{2}))$

For the proof itself, we actually need bivariate polynomials; univariate polynomials will not be enough. Nevertheless, the proof is much similar to the previously described permutation check.

Define two auxiliary polynomials, which will be bivariate polynomials of total degree $∣Ω∣$ :

$\hat{f} (X, Y) = \prod_{a \in Ω} (X - Y \cdot W (a) - f (a))$
$\overset{g}{^} (X, Y) = \prod_{a \in Ω} (X - Y \cdot a - g (a))$

The lemma here is that if $\hat{f} (X, y) = \overset{g}{^} (X, Y)$ then $(W (a), f (a))_{a \in Ω}$ is a permutation of $(a, g (a))_{a \in Ω}$ . The proof of this is left as exercise, though if you want to try, you might make use of the fact that $F_{p} [X, Y]$ is a unique factorization domain.

The protocol continues with the verifier generating two random points $r, s \in F_{p}$ and sending these to the prover. Again, instead of actually evaluating the auxiliary polynomials, the prover will do a prod-check over what they describe:

$\frac{f ^ ( r , s )}{g ^ ( r , s )} = a \in Ω \prod (\frac{r - s \cdot W ( a ) - f ( a )}{r - s \cdot a - g ( a )}) = 1$

This protocol is sound and complete, assuming $2 d / p$ is negligible. The cost of this protocol is just like the cost described for prod-check.

PLONK

The time has come! PLONK [Gabizon-Williamson-Ciobotaru'19] is a poly-IOP for a general circuit $C (x, w)$ .

But, before we delve into PLONK, we must realize that PLONK itself in practice is more like an abstract IOP, that when used with some poly-commit scheme will result in a SNARK system. Here are some examples in practice:

Poly-commit Scheme	Poly-IOP	SNARK System
KGZ'10, uses pairings	PLONK	Aztec, JellyFish
Bulletproofs, no pairings required	PLONK	Halo2
FRI, uses hashes	PLONK	Plonky2

With that said, let us begin.

Step 1: Compile circuit to computation trace

We will use an example circuit with an example evaluation. Our circuits have gates with two inputs and a single input, also shown as "gate fan-in = 2".

flowchart LR
	x1((x_1)) --> a1[+]
	x2((x_2)) --> a1
	x2((x_2)) --> a2[+]
	w1((w_1)) --> a2
	a1 --> m1[x]
	a2 --> m1[x]
	m1 --> o(( ))

The circuit above computes $(x_{1} + x_{2}) (x_{2} + w_{1})$ . Suppose that the public inputs are $x_{1} = 5, x_{2} = 6$ and the secret input (witness) is $w_{1} = 1$ . As a result, the output of this circuit is 77, which is also public. The prover will try to prove that he knows a value of $w_{1}$ that makes the output 77 with the given public inputs.

flowchart LR
	x1((x_1)) --5--> a1[+]
	x2((x_2)) --6--> a1
	x2((x_2)) --6--> a2[+]
	w1((w_1)) --1--> a2
	a1 --11--> m1[x]
	a2 --7--> m1[x]
	m1 --77--> o(( ))

We compile this evaluation into a computation trace, which is simply a table that shows inputs and outputs for each gate, along with circuit inputs.

Circuit inputs are $5, 6, 1$ .
Gate traces are given in the following table.

Gate No.	Left Input	Right Input	Output
Gate 0	5	6	11
Gate 1	6	1	7
Gate 2	11	7	77

Step 1.5: Encode trace as a polynomial

We have spent a lot of time learning how to commit to polynomials, so let's get them to work! First, some definitions:

$∣ C ∣$ is the circuit size, equal to number gates in the circuit.
$∣ I ∣ = ∣ I_{x} ∣ + ∣ I_{w} ∣$ is the number of inputs to the circuit, which is the number of public inputs and the secret inputs combined.
We then let $d = 3 \times ∣ C ∣ + ∣ I ∣$
Let $Ω = {1, ω, ω^{2}, \dots, ω^{d - 1}}$ where $ω^{d} = 1$ .

The plan is to encode the entire computation trace into a polynomial $T \in F_{p}^{(\leq d)} [X]$ such that:

$T$ encodes all inputs with $T (ω^{- j})$ correspond to input $j \in [∣ I ∣]$
$T$ encodes all wires, which it does $\forall ℓ \in {0, 1, \dots, ∣ C ∣ - 1}$ :
- $T (ω^{3 l})$ corresponds to the left input of gate $ℓ$
- $T (ω^{3 l + 1})$ corresponds to the right input of gate $ℓ$
- $T (ω^{3 l + 2})$ corresponds to output of gate $ℓ$

In the circuit example above, there are 12 points, which defines a degree-11 polynomial. To interpolate a polynomial to the values in the computation trace, the prover can actually use Fast Fourier-Transform (FFT) to compute the coefficients of polynomial $T$ in time $O (d lo g d)$ .

However, in general we won't compute the coefficients, but instead use the point-value representation of the polynomial as described above.

Step 2: Prove validity of $T$

So the prover has computed $T$ , and committed to it as $T$ . It sends it to the verifier, but the verifier must make sure that $T$ indeed belongs to the correct computation trace. To do that, it must do the following:

$T$ encodes the correct inputs
Every gate is evaluated correctly
The "wiring" is implemented correctly
The output of last gate is 0. Well, in this example the output is 77, but generally the verifier expects a 0 output, remember how we say $C (x, w) = 0$ .

(1) $T$ encodes the correct inputs

Remember both prover and verifier has the statement $x$ . They will now interpolate a polynomial $v (X) \in F_{p}^{(\leq ∣ I_{x} ∣)} [X]$ that encodes the $x$ -inputs to the circuit:

for $j \in {1, 2, \dots, ∣ I_{x} ∣} : v (ω^{- j}) = input #j$
In our example: $v (ω^{- 1}) = 5, v (ω^{- 2}) = 6, v (ω^{- 3}) = 1$ so $v$ is quadratic (i.e. it is defined by 3 points).
Constructing $v (X)$ takes linear time proportional to the size of input $x$ .

Next, they will agree on the points encoding the input $Ω_{in p} := {ω^{- 1}, ω^{- 2}, \dots, ω^{- ∣ I_{x} ∣}}$ . Then, the prover will use a zero-test on $Ω_{in p}$ to prove that $T (y) - v (y) = 0$ for all $y \in Ω_{in p}$ .

It is quite easy to do this, because vanishing polynomial for $Ω_{in p}$ is often calculated quickly.

(2): Every gate is evaluated correctly

The idea here is to encode gate types using a selector polynomial $S (X)$ . Remember that in our example we encoded the two gate inputs and an output as $ω$ to the power $3 ℓ, 3 ℓ + 1, 3 ℓ + 2$ for some gate $ℓ$ . Now, we will encode the "types" of these gates.

Define $S (X) \in F_{p}^{(\leq d)} [X]$ such that $\forall ℓ \in {0, 1, \dots, ∣ C ∣ - 1}$ :

$S (ω^{3 l}) = 1$ if gate $ℓ$ is an addition gate +
$S (ω^{3 l}) = 0$ if gate $ℓ$ is a multiplication gate x

In our example, $S (ω^{0}) = 1, S (ω^{3}) = 1, S (ω^{6}) = 0$ . Notice that the selector polynomial depends on the circuit, but not on the inputs! So in fact, the selector polynomial is a product of the preprocessing setup: prover will have $S$ itself, and the verifier will have a commitment to it $S$ .

Now, we make a really nice observation: $\forall y \in Ω_{g a t es} := {1, ω^{3}, ω^{6}, \dots, ω^{3 (∣ C ∣ - 1)}}$ it should hold that:

$S (y) (T (y) + T (ω y)) + (1 - S (y)) (T (y) \times T (ω y)) = T (ω^{2} y)$

Here, $T (y), T (ω y), T (ω^{2} y)$ are the left input, right input and output respectively. Prover will use a zero-test on the set $Ω_{g a t es}$ to prove that $\forall y \in Ω_{g a t es}$ :

$S (y) \times (T (y) + T (ω y)) + (1 - S (y)) (T (y) \times T (ω y)) - T (ω^{2} y) = 0$

(3) The wiring is correct

What do we mean by wiring? Well, if you look at the circuit (or the table) you will notice that some outputs become inputs on other gates. For example, the input 6 is a right input for gate 0, and a left input for gate 1 and such. Prover will have to prove that this wiring has been done correctly.

For that, the wires of $C$ are encoded with respect to their constraints. In our example:

$T (ω^{- 2}) = T (ω^{1}) = T (ω^{3})$
$T (ω^{- 1}) = T (ω^{0})$
$T (ω^{2}) = T (ω^{6})$
$T (ω^{- 3}) = T (ω^{4})$

Define a polynomial $W : Ω \to Ω$ that implements a single left-rotation:

$W (ω^{- 2}, ω^{1}, ω^{3}) = (ω^{1}, ω^{3}, ω^{- 2})$
$W (ω^{- 1}, ω^{0}) = (ω^{- 1}, ω^{0})$
$W (ω^{2}, ω^{6}) = (ω^{6}, ω^{2})$
$W (ω^{- 3}, ω^{4}) = (ω^{4}, ω^{- 3})$

Why we do this fantastic thing is due to a lemma; if $\forall y \in Ω : T (y) = T (W (y))$ then the wire constraints are satisfied. The idea behind this bizarre method is that, if $T$ is indeed invariant (does not change its behavior) under such a rotation, then the wiring must be correct. This is because had the wirings be false, the rotation would cause a value to be different and this would not hold.

Remember that $P (W (y))$ has degree $d \times d = d^{2}$ but we want prover to work in linear time $d$ only. This is where the prescribed permutation check we have described in this lecture comes into play.

(4) Output of last gate is 0

Proving the last one is easy, just show that $T (ω^{3∣ C ∣ - 1}) = 0$ .

Final Step: The PLONK Poly-IOP

The setup procedure results in $pp \to (S, W)$ and $v p \to (S, W)$ and is transparent, no need for trust! The prover knows $(pp, x, w)$ and verifier knows $(v p, x)$ .

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: construct T(X)
	P ->> V: com_T
	note over V: construct V(x)
	note over P, V: ✅ T encodes the inputs
	note over P, V: ✅ gates are evaluated correctly
	note over P, V: ✅ wiring is correct
	note over P, V: ✅ output of last gate is 0

The resulting PLONK proof is a short one, it has $O (1)$ commitments! Furthermore, the verifier is fast. Although the SNARK we have described is not zero-knowledge, it is quite easy to make it into a zkSNARK. There are actually generic transformations that can convert any poly-IOP into a zero-knowledge poly-IOP.

The paper proves that if $7∣ C ∣/ p$ is negligible, then this PLONK poly-IOP is complete and knowledge sound. Try and see for yourself where that 7 comes from.

PLONK Extensions

The main challenge is to reduce the prover time. Furthermore, just using + and x gates might feel a bit too constraining. We do have alternative solutions though! Each of the following try to improve the prover time in various ways.

HyperPlonk

What HyperPlonk [Chen-Bünz-Boneh-Zhang'22] does is that they replace $Ω$ with ${0, 1}^{t}$ where $t = lo g ∣Ω∣$ . As such, the polynomial $T$ becomes a multilinear polynomial with $t$ variables. Zero-test is then replaced by a multilinear sum-check that runs in linear time.

Plonkish Arithmetization: Custom Gates

In our example, we had gates with two inputs and an output, along with selector polynomials that cover addition and multiplication. Furthermore, each constraint was specific to the row itself. It is possible to generalize this usage to obtain custom gates, that can even make use of multiple rows! Custom gates are included in the gate check step. This is used in AIR (Algebraic Intermediate Representation).

Plookup: Lookup Tables

There is an extension of Plonkish Arithmetization, that actually enables one to ensure that some values in the computation trace are present in a pre-defined list, basically acting like a look-up argument!

Recall: How to build an Efficient SNARK?

There are various paradigms on building SNARKs, but the general paradigm is made up of two steps:

A functional commitment scheme, which is a cryptographic object
A suitable interactive oracle proof (IOP), which is an information theoretic object

In Lecture 5, we have seen PLONK, which operated with:

A univariate polynomial commitment scheme
PLONK polynomial IOP

In Lecture 4, we have seen another method using the sum-check protocol:

A multivariate polynomial commitment scheme
Sum-check protocol for IOP

In this lecture, we will dive deeper into polynomial commitment schemes, in particular those that are based on bilinear pairings and discrete logarithm.

What is a Polynomial Commitment?

Consider a family of polynomials $F$ . The prover has some polynomial $f \in F$ that is a function $f : X \to Y$ . The interactions of a polynomial commitment scheme look like the following:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: f ∈ F

	P ->> V: (1) com_f := commit(f)
	V ->> P: (2) u ∈ X
	P ->> V: (3) v ∈ Y, proof π
	note over V: (4) Accept or Reject

Let us explain each numbered step in this sequence diagram:

Prover commits to the polynomial, and sends the commitment $co m_{f}$ , also shown as $f$ .
The verifier would like to query this polynomial at some value $u$ , so it sends this to the prover.
Prover evaluates $v := f (u)$ and sends this to verifier, along with a proof $π$ that indeed $v = f (u)$ and $f \in F$ .
The verifier will verify this proof, and accept if it is valid.

A bit more formality

Let's make a more formal definition of polynomial commitments now. They are made of 4 algorithms:

$k ey g e n (λ, F) \to g p$ generate public parameters given the polynomial family and security parameter $λ$ . Note the $g p$ is also known as "common reference string", or "public key" too.
$co mmi t (g p, f) \to co m_{f}$ computes the commitment to the given polynomial.
$e v a l (g p, f, u) \to v, π$ is the evaluation method that the prover uses to compute $v := f (u)$ and generate a proof that $v = f (u)$ this is true and $f \in F$ .
$v er i f y (g p, co m_{f}, u, v, π) \to {0, 1}$ verifies an evaluation, and it will either accept or reject (which is why I used a bit to represent output).

Polynomial commitment has the following properties:

Correctness: if the prover is honest, and follows the given algorithms above, then $v er i f y$ should always accept. We won't go too much into this one yet.

Knowledge Soundness: for every polynomial time adversary $A = (A_{0}, A_{1})$ such that:

$g p \leftarrow k ey g e n (λ, F)$
$co m_{f} \leftarrow A_{0} (g p)$
$v, π \leftarrow A_{1} (g p, u)$

where $Pr [V (v p, x, π) = accept] = 1$ , then there is an efficient extractor $E$ that uses $A_{1}$ as a black box (oracle) such that:

$g p \leftarrow k ey g e n (λ, F)$
$co m_{f} \leftarrow A_{0} (g p)$
$f \leftarrow E (g p, co m_{f})$

where $Pr [f (u) = v and f \in F] > 1 - ϵ$ . Meaning that if the prover can convince the verifier about some committed polynomial $f$ , then an efficient extractor can "extract" that $f$ from the prover via some algorithm, and then find out that indeed the evaluations are correct, with at most some negligible probability of failure.

Group Theory

It will be good to refresh our memory on some of the preliminary concepts, starting with group theory!

A group is a set $G$ and an operation $*$ . It satisfies the following properties:

Closure: $\forall a, b \in G$ it holds that $a * b \in G$ . In other words, the result of this operation is still an element of this group, it is closed in this group!
Associativity: $\forall a, b, c \in G$ it holds that $(a * b) * c = a * (b * c)$ . The order of execution should not matter.
Identity: $\exists e \in G$ such that $\forall a \in G$ it holds that $e * a = a * e = a$ . We call $e$ the identity element, because the result of operating on it is identical to the operand.
Inverse: $\forall a \in G, \exists b \in G$ such that $a * b = b * a = e$ . This is very important to keep in mind.

For example, the set of integers $Z = {\dots, - 1, 0, 1, \dots}$ under addition operation $+$ satisfies all these properties. You might also be familiar with rational numbers, real numbers, complex numbers and such.

In cryptography, here are some commonly used groups:

Positive integers $mod$ some prime number $p$ , which is the set ${1, 2, \dots, p - 1}$ under the multiplication operation $\times$ . This is denoted as $Z_{p}^{*}$ .
Elliptic curves, we will not go into this yet though.

Generator of a Group

An element $g \in G$ that generates all the other elements in the group by taking its powers is called the generator of that group. For example, consider the multiplicative group $Z_{7}^{*} = {1, 2, 3, 4, 5, 6}$ . See what happens if we take powers of 3 in this group.

$1$	$2$	$3$	$4$	$5$	$6$
$3^{6}$	$3^{2}$	$3^{1}$	$3^{4}$	$3^{5}$	$3^{3}$

Notice that $3^{6} = 1$ , so you can continue to powering $3^{7} = 3$ , $3^{8} = 2$ and so on but you will keep getting the same values. Furthermore, there can be multiple generators within a group!

Discrete Logarithm Assumption

So think of a some group $G$ with $p$ elements. You could represent the group by literally writing out all its $p$ elements. Alternatively, you could just note down the generator $g$ and find its $p$ powers to obtain the group elements; keep in mind that there may be different generators too. With that in mind, let us look at the discrete logarithm problem:

given $y \in G$ find $x$ such that $g^{x} = y$

It turns out that this is very hard to do, you basically have to try your luck with $x$ . There are some clever methods too, but it is the general consensus that this problem is computationally hard, meaning that you can't solve it in polynomial time.

Quantum computers can actually solve this in polynomial time, and any scheme that uses discrete log is therefore not post-quantum secure.

Diffie-Hellman Assumption

You might remember this duo from the Diffie-Hellman Key-Exchange [Diffie-Hellamn'76]. The paper is based on the Diffie-Hellman Assumption, which is very similar to the discrete logarithm assumption.

given $G, g, g^{x}, g^{y}$ compute $g^{x y}$

This is a stronger assumption that the discrete logarithm assumption, meaning that it "assumes more stuff". In other words, if discrete logarithm assumption breaks, you can break this one too. What you could do it, simply find $x$ from $g^{x}$ and $y$ from $g^{y}$ and just compute $g^{x y}$ .

To our best knowledge, this is also a hard problem and there is no efficient solution yet.

Bilinear Pairing

Bilinear pairings are an awesome building block that we will make use of. You have the following:

$G$ : the base group, a multiplicative cyclic group
$G_{T}$ : the target group, yet another multiplicative cyclic group
$p$ : the order of both $G$ and $G_{T}$
$g$ : the generator of base group $G$
$e$ : a pairing operation $e : G \times G \to G_{T}$

The pairing must have the following bilinearity property:

$\forall P, Q, \in G : e (P^{x}, Q^{y}) = e (P, Q)^{x y}$

Note that computing $e$ itself may be efficient or not, that depends on the groups that are being used. Also note that you could have two different base groups, unlike our example above which just uses on base group for both $P$ and $Q$ .

Example: Diffie-Hellman

Consider $e (g, g)^{x y}$ , what can this be equal to?

$e (g, g)^{x y} = e (g^{x}, g^{y})$
$e (g, g)^{x y} = e (g^{x y}, g)$

That means $e (g^{x}, g^{y}) = e (g^{x y}, g)$ . We know that given $g^{x}, g^{y}$ we can't compute $g^{x y}$ that is the Diffie-Hellman assumption. However, what if someone claims that they have computed $g^{x y}$ ? Well, we can check this without learning what $x, y$ is simply with the aforementioned equality.

Example: BLS Signature

BLS signature [Boneh-Lynn-Shacham'01] is an example usage of bilinear pairings. It is a signature scheme with the following functions:

$k ey g e n (p, G, g, G_{T}, e) \to (s k = x, p k = g^{x})$ that is the secret (private) key and public key respectively
$s i g n (s k, m) \to σ = H (m)^{x}$ where $H$ is a cryptographic hash function that maps the message space to $G$
$v er i f y (p k, σ, m) \to {0, 1}$ will verify if $e (H (m), g^{x}) = e (σ, g)$ . Notice that $g^{x}$ comes from $p k$ , and $H$ is a known public hash function

KZG Poly-Commit Scheme with Pairings

Remember we had went over KZG in the previous lecture, but there at some point we had to mention "pairings". Now, we will look at KZG again but with pairings included.

Suppose you have a univariate polynomial function family $F = F_{p}^{(\leq d)} [X]$ and some polynomial that you would like to commit $f \in F$ . You also have a bilinear pairing $p, G, q, G_{T}, e$ . Let's see how KZG works with these.

$k ey g e n (λ, F) \to g p$
- Sample a random $τ \in F_{P}$
- Set $g p = (g, g^{τ}, g^{τ^{2}}, \dots, g^{τ^{d}})$
- Delete $τ$ , its toxic waste at this point and you should make sure no one gets it. If they do, they can generate fake proofs. This is why a trusted setup is required for KZG.
$co mmi t (g p, f) \to co m_{f}$
- The polynomial is represented with its coefficients $f (x) = f_{0} + f_{1} x . + f_{2} x^{2} + \dots + f_{d} x^{d}$ .
- The commitment is $co m_{f} = g^{f (τ)}$ . How can this be done without knowing $τ$ ? Well, that is where $g p$ comes into play.
- Notice that $g^{f (τ)} = g^{f_{0} = f_{1} τ + f^{2} τ^{2} + \dots + f_{d} τ^{d}}$
- Its equal to $(g)^{f_{0}} + (g^{τ})^{f_{1}} + (g^{τ^{2}})^{f_{2}} + \dots (g^{τ^{d}})^{f_{d}}$ which can be done using just the elements in the public parameters $g p$ and the coefficients of $f$ .
$e v a l (g p, f, u) \to v, π$
- A verifier wants to query this polynomial at point $u$ , and you would like to show that $f (u) = v$ along with a proof $π$ that this is indeed true.
- To do this, first you find a quotient polynomial $q (x)$ such that $f (x) - f (u) = (x - u) q (x)$ . Note that $u$ is a a root of $f (x) - f (u)$ .
- Then, your proof is $π = g^{q (τ)}$ which you can do without knowing $τ$ but instead using $g p$ , as shown in the last bullet under $co mmi t$ .
$v er i f y (g p, co m_{f}, u, v, π) \to {0, 1}$
- The idea is to check the equation at point $τ$ as $g^{f (τ) - f (u)} = g^{(τ - u) q (τ)}$ .
- The verifier knows $g^{(τ - u)}$ as $g^{τ}$ is in $g p$ and $g^{u}$ the verifier can calculate, and it also knows $g^{q (τ)}$ because that is the proof $π$ sent by the prover. However, Diffie-Hellman assumption tells us that just by knowing these two, the verifier can't compute $g^{(τ - u) q (τ)}$ . So what do we do?
- We can use a bilinear pairing! We will make use of the pairing $e (co m_{f} / g^{v}, g) = e (g^{(τ - u)}, π)$ . Notice that this pairing translates to $e (g, g)^{f (τ) - f (u)} = e (g, g)^{(τ - u) q (τ)}$ . The verifier can simply check this equality, and accepts if it is correct.

sequenceDiagram
  actor P as Prover
  actor V as Verifier

  %% keygen
  note over P, V: gp = (g, g^t, g^(t^2), ..., g^(t^d))

  %% comittment
  note over P: f ∈ F
  P ->> V: com_f := g^f(t)

  %% eval
  V ->> P: u
  note over P: v = f(u)
  note over P: f(x)-f(u) = (x-u)q(x)

  %% verify
  P ->> V: v, proof π = g^q(t)
  note over V: g^(t-u) = g^t / g^u
  note over V: e(com_f / g^v, g) ?= e(g^(t-u), π)
  note over V: if true, Accept, otherwise Reject

q-Strong Bilinear Diffie-Hellman

What about the properties of KZG?

Correctness: if the prover is honest, then the verifier will always accept.
Soundness: how likely is a fake proof to be verified?

The answer to this comes from something called "q-Strong Bilinear Diffie-Hellman" (short for q-SBDH) assumption. That is, given $(p, G, g, G_{T}, e)$ and $(g, g^{τ}, g^{τ^{2}}, \dots, g^{τ^{d}})$ it is hard to compute $e (g, g)^{\frac{1}{τ - u}}$ for any $u$ . This is a stronger assumption that computational Diffie-Hellman.

Let us prove the soundness then! We will do a Proof by Contradiction. Suppose $v^{*} \neq = f (u)$ yet a fake proof $π^{*}$ passes the verification. We begin the proof by writing the bilinear pairing equivalence that the verifier checks for:

$e (co m_{f} / g^{v^{*}}, g) = e (g^{τ - u}, π^{*})$

We will assume that the prover knows $f$ , which is a strong assumption but we will explain it later (see Knowledge of Exponent). Therefore, we have the following equivalence:

$e (g^{f (τ) - v^{*}}, g) = e (g^{τ - u}, π^{*})$

Now the trick of the proof: we add $f (u) - f (u)$ to the leftmost exponent, which is effectively adding 0 so it does not change anything.

$e (g^{f (τ) - f (u) + f (u) - v^{*}}, g) = e (g^{τ - u}, π^{*})$

Now define $δ = f (u) - v^{*}$ , which is non-zero because we have said above that $v^{*} \neq = f (u)$ . We rewrite the left hand-side:

$e (g^{(τ - u) q (τ) + δ}, g) = e (g^{τ - u}, π^{*})$

As a result of pairing, we can move the exponents outside $e$ as:

$e (g, g)^{(τ - u) q (τ) + δ} = e (g, π^{*})^{τ - u}$

Dividing both sides by $e (g, g)^{τ - u}$ we get:

$e (g, g)^{δ} = (e (g, π^{*}) / e (g, g)^{q (τ)})^{τ - u}$

Finally, taking both sides to power $- 1/ (τ - u)$ leads to a contradiction!

$e (g, g)^{\frac{δ}{τ - u}} = e (g, π^{*}) / e (g, g)^{q (τ)}$

Notice the left side that is $e (g, g)^{\frac{δ}{τ - u}}$ , which is what q-SBDH was about! Even more, the right side of the equation only has globally available variables, so this means that the prover can break q-SBDH assumption.

Knowledge of Exponent (KoE)

We have said that we assume the prover knows $f$ such that $co m_{f} = g^{f (τ)}$ . How can we make sure of this? Here is how:

Again you have $g p = (g, g^{τ}, g^{τ^{2}}, \dots, g^{τ^{d}})$
Sample some random $α$ compute $g p^{α} = (g^{α}, g^{ατ}, g^{α τ^{2}}, \dots, g^{α τ^{d}})$
Now compute two commitments instead of one, $co m_{f} = g^{f (τ)}$ and $co m_{f}^{'} = g^{α f (τ)}$

Now we can make use of bilinear pairings: if $e (co m_{f}, g^{α}) = e (co m_{f}^{'}, g)$ there exists an extractor $E$ that extracts $f$ such that $co m_{f} = g^{f (τ)}$ . This extractor will extract $f$ in our proof above, where we assumed the prover knows $f$ .

Let us then describe the KZG with knowledge soundness:

Keygen generates both $g p$ and $g p^{α}$
Commit computes both $co m_{f}$ and $co m_{f}^{'}$
Verify additionally checks $e (co m_{f}, g^{α}) = e (co m_{f}^{'}, g)$

Note that this doubles the cost of key generation and commitments, as well as the verifier time.

Generic Group Model (GGM)

GGM [Shoup'97], [Maurer'05] can replace the KoE assumption and reduce commitment size in KZG. The informal definition is that the adversary is only given an oracle to compute the group operation, that is: given $(g, g^{τ}, g^{τ^{2}}, \dots, g^{τ^{d}})$ the adversary can only compute their linear combinations.

Properties of KZG

So let us write down the properties of KZG.

Keygen requires trusted setup
Commit requires $O (d)$ group exponentiations and $O (1)$ commitment size
Eval requires $O (d)$ group exponentiations, and $q (x)$ can be computed efficiently in linear time
Proof size is $O (1)$ for the single group element
Verifier time is $O (1)$ for the pairing check

Powers of Tau Ceremony

Okay, trusted setup sucks but we can make it a bit nicer: instead of letting one party come up with the global parameters, we will distribute this process to multiple parties. In doing so, even if only one party is honest and gets rid of the "toxic waste", then that is enough for us so that no one will be able to reconstruct the trapdoor. Here is how:

Suppose your global parameters are $g p$ right now:

$g p = (g^{τ}, g^{τ^{2}}, \dots, g^{τ^{d}}) = (g_{1}, g_{2}, \dots, g_{d})$

As a participant in this ceremony, you sample some random $s$ , and obtain new $g p^{'}$ as:

$g p^{'} = (g_{1}^{s}, g_{2}^{s^{2}}, \dots, g_{d}^{s^{d}}) = (g^{s τ}, g^{(s τ)^{2}}, \dots, g^{(s τ)^{d}})$

This can go on for many participants, see for example Perpetual Powers of Tau.

A new method [Nikolaenko-Ragsdale-Bonneau-Boneh'22] provides a way to check the correctness of $g p^{'}$ too. The idea is:

the contributor knows $s$ such that $g_{1}^{'} = (g_{1})^{s}$
and $g p^{'}$ consists of consecutive powers $e (g_{i}^{'}, g_{1}^{'}) = e (g_{i + 1}^{'}, g)$ where $g_{1}^{'} \neq = 1$

Variants of KZG

We will look at several extensions of KZG univariate polynomial commitment scheme.

Multivariate KZG

[Papamanthou-Shi-Tamassia'13] describes a way to use KZG for multivariate polynomials. The idea is:

$f (x_{1}, x_{2}, \dots, x_{k}) - f (u_{1}, u_{2}, \dots, u_{k}) = i = 1 \sum k (x_{i} - u_{i}) q_{i} (x)$

Keygen will sample $τ = τ_{1}, τ_{2}, \dots, τ_{k}$ to compute $g p$ as $g$ raised to all possible monomials of $τ$
Commit will compute $co m_{f} = g^{f (τ_{1}, τ_{2}, \dots, τ_{k})}$
Eval will compute a group element for each polynomial $π_{i} = g^{q_{i} (τ)}$
Verify will check $e (co m_{f} / g^{v}, g) = \prod_{i = 1}^{k} e (g^{τ_{i} - u_{i}}, π_{i})$

Notice that the proof size and verifier time is greater here.

You can find in practice vSQL [ZGKPP'17] and Libra [XZZPS'19] that makes use of multivariate KZG as the commitment scheme and Sum-check protocol or GKR protocol as the IOP to obtain a SNARK.

Achieving Zero-Knowledge

Plain KZG is not zero-knowledge, e.g. $co m_{f} = g^{f (τ)}$ is deterministic. Also remember that to formally show zero-knowledgeness, we need a simulator construction that can simulate the view of the commitment scheme.

[ZGKPP'18] shows a method to do this by masking with randomizers.

Commit will compute the masked $co m_{f} = g^{f (τ) + rη}$
Eval will also be masked as follows:

$f (x) + ry - f (u) = (x - u) (q (x) + r^{'} y) + y (r - r^{'} (x - u))$

The proof will therefore be $π = g^{q (τ) + r^{'} η}, g^{r - r^{'} (τ - u)}$ . Note that this looks much like the bivariate extension of KZG, but the other polynomial is like some added randomness to provide zero-knowledge property.

Batch Proofs on a Single Polynomial

Prover wants to prove $f$ at multiple points $u_{1}, u_{2}, \dots, u_{m}$ for $m < d$ . The key idea is to extrapolate $f (u_{1}), f (u_{2}), \dots, f (u_{m})$ to obtain $h (x)$ . Then,

we find a quotient polynomial from $f (x) - h (x) = \prod_{i = 1}^{m} (x - u_{i}) q (x)$
the proof then becomes $π = g^{q (τ)}$
verifier will check $e (co m_{f} / g^{h (τ)}, g) = e (g^{\prod_{i = 1}^{m} (τ - u_{i})}, π)$

Batch Proofs on Multiple Polynomials

Prover wants to prove $f$ at multiple points and polynomials $f_{i} (u_{i, j}) = v_{i, j}$ for $i \in [n], j \in [m]$ . The key idea is to extrapolate $f_{i} (u_{i, 1}), f_{i} (u_{i, 2}), \dots, f_{i} (u_{i, m})$ to obtain $h_{i} (x)$ . Then,

we find quotient polynomials from $f_{i} (x) - h_{i} (x) = \prod_{j = 1}^{m} (x - u_{i, j}) q_{i} (x)$
the proof will have all $q_{i} (x)$ in a random linear combination

Bulletproofs

Although the powers-of-tau ceremony helps on the "trusted setup" problem, Bulletproofs [BCCGP'16], [BBBPWM'18] completely remove the "trusted setup" problem of KZG!

$k ey g e n$
- Bulletproofs have a "transparent setup" phase, which is to simply $d + 1$ randomly sampled elements from a group $G$ , resulting in $g p = (g_{0}, g_{1}, \dots, g_{d})$
$co mmi t (g p, f) \to co m_{f}$
- suppose you have a polynomial $f (x) = f_{0} + f_{1} x + f_{2} x^{2} + \dots f_{d} x^{d}$
- commitment is $co m_{f} = g_{0}^{f_{0}} g_{1}^{f_{1}} \dots g_{d}^{f_{d}}$
- notice that this is a "vector commitment" version of a Pedersen Commitment

Then, do $e v a l$ and $v er i f y$ recursively around $lo g d$ times:

$e v a l (g p, f, u)$
- find $v = f (u)$
- compute $L, R, v_{l}, v_{r}$
- receive a random $r$ from verifier and reduce $f$ to $f^{'}$ of degree $d /2$
- update the global parameter to be $g p^{'}$
$v er i f y (g p, co m_{f}, u, v, π)$
- check $v = v_{L} + v_{R} \times u^{d /2}$
- generate $r$ randomly
- update $co m^{'} = L^{r} \times co m_{f} \times R^{r^{- 1}}$ (this is the magic trick)
- update the global parameter to be $g p^{'}$
- set $v^{'} = r v_{L} + r_{R}$

The idea of Bulletproofs is to recursively divide a polynomial in two polynomials, and commit to those smaller polynomials, eventually reducing whatever degree you have started with to 1.

The Magic Trick

So let's go over what happens in that magical line where we obtain the new commitment given $L, R$ (left & right respectively). Suppose we have a polynomial of degree 3

$f (x) = f_{0} + f_{1} x + f_{2} x^{2} + f_{3} x^{3}$

and our commitment is:

$co m_{f} = g_{0}^{f_{0}} g_{1}^{f_{1}} g_{2}^{f_{2}} g_{3}^{f_{3}}$

The prover splits the polynomial into two polynomials of half the degree:

left half: $f_{L} (x) = f_{0} + f_{1} x$
right half: $f_{R} (x) = f_{2} + f_{3} x$

It will also commit the each half:

left half: $L = g_{2}^{f_{0}} g_{3}^{f_{1}}$
right half: $R = g_{0}^{f_{2}} g_{1}^{f_{3}}$

Did you notice the criss-cross between the group elements and their exponents? The $g$ terms are fitting nicely with the coefficients, but the exponents are actually belonging to the other polynomial! This is a way of "relating" both halves together, so to restrain the prover to some extent and keep the computations sound.

The magical line is the following:

$co m^{'} = L^{r} \times co m_{f} \times R^{r^{- 1}}$

$co m^{'} = (g_{0}^{f_{0} + r^{- 1} f_{2}} g_{2}^{r f_{0} + f_{2}}) (g_{1}^{f_{1} + r^{- 1} f_{3}} g_{3}^{r f_{1} + f_{3}})$

$co m^{'} = (g_{0}^{r^{- 1}} g_{2})^{r f_{0} + f_{2}} \times (g_{1}^{r^{- 1}} g_{3})^{r f_{1} + f_{3}}$

Now notice that this commitment is what you would have gotten if you had:

a polynomial $f^{'} (x) = (r f_{0} + f_{2}) + (r f_{1} + f_{3}) x$
and $g p^{'} = (g_{0}^{r^{- 1}} g_{2}, g_{1}^{r^{- 1}} g_{3})$

Both are half the size of what we had started with! If you keep doing this recursively, you will end up with a degree-1 polynomial in around $lo g d$ steps. Without caring about zero-knowledge property, the prover could simply send the constant sized polynomial for the last step to prove the evaluation.

Also to mention, you could take "odd-elements & even-elements" instead of "left-half & right-half" for this process, which is similar to what is done in FFT, and it would still work!

sequenceDiagram
	actor P as Prover
	actor V as Verifier

  %% keygen
	note over P, V: gp = (g_0, g_1, g_2, ..., g_d)

	%% comittment
	note over P: f ∈ F
	P ->> V: com_f := g_0f^0 * g_1f^1 * ... * g_df^d

  %% eval
	V ->> P: u
  note over P: v = f(u)
	P ->> V: v

	loop while deg(f) > 1
	note over V: sample random r
	V ->> P: r

  note over P: L, R := split(f)
	P ->> V: v_L = L(u), com_L, v_R = R(u), com_R
	note over P: f' := reduce(L, R, r)

  %% verify
  note over V: assert: v == v_L + v_R * u^2
	note over V: com' = L^r * com_f * R^{r^-1}
	note over V: update gp as gp'
	note over V: v' = r * v_L + v_R
	note over V, P: recurse with f', com', v', gp'
	end

	note over V, P: prove eval of a const-sized polynomial
	note over V: accept or reject

More Poly-Commit Schemes

More advanced schemes based on d-log with transparent setup are out there, and we will go over them quickly.

Hyrax

Hyrax [Wahby-Tzialla-Shelat-Thaler-Walfish'18] improves the verifier time to $O (d)$ by representing the coefficients as a 2D matrix. This way, it commits to the matrix row-wise, and does reduction column-wise. Proof size also becomes $O (d)$ .

Dory

Dory [Lee'21] delegates the structured computation to the prover using inner pairing product arguments [BMMTV'21]. This way, verifier time becomes $O (lo g d)$ and prover time becomes $O (d)$ exponentiations plus $O (d)$ field operations, so prover time is still linear but in practice it is a bit more efficient.

Dark

Dark [Bünz-Fisch-Szepieniec'20] achieves $O (lo g d)$ proof size and verifier time! The trick here is to use group of an unknown order.

Summary

Here is a quick summary of all the methods covered in this lecture.

Scheme	Prover Time	Proof Size	Verifier Time	Setup Phase	Cryptographic primitive
KZG	$O (d)$	$O (1)$	$O (1)$	Trusted	Pairing
Bulletproofs	$O (d)$	$O (lo g d)$	$O (d)$	Transparent	Discrete-log
Hyrax	$O (d)$	$O (d)$	$O (d)$	Transparent	Discrete-log
Dory	$O (d)$	$O (lo g d)$	$O (lo g d)$	Transparent	Pairing
Dark	$O (d)$	$O (lo g d)$	$O (lo g d)$	Transparent	Unknown order Group

Polynomial Commitments

In the previous lecture, we have seen some polynomial commitment schemes based on pairings and discrete-log.

In this lecture, we will look at polynomial commitment schemes based on error-correcting codes. They are quite an awesome tool because:

they are plausibly post-quantum secure (recall that d-log is not)
no group exponentiations; instead, prover only uses hashes, additions and multiplications
small global parameters

Nevertheless, there are some drawbacks too:

large proof size
not homomorphic & hard to aggregate

Background: Error-correcting Code

An error-correcting code encodes a message of length $k$ into a codeword of length $n$ , where $n > k$ . The minimum distance (Hamming) between any two codewords is shown as $Δ$ . These parameters are important, and we may refer to an error-correcting code as:

$[n, k, Δ]$ code

Example: Repetitions Code

Imagine messages of 2 bits and codewords of 6 bits, where the encoding is to repeat each bit 3 times.

enc(00) = 000000
enc(01) = 000111
enc(10) = 111000
enc(11) = 111111

Note that the minimum distance between any two codewords is 3. So our parameters are $n = 2, k = 6, Δ = 3$ . This code can correct 1 error during the transmission, for example:

dec(010111) = 01
// 010 should be 000

As shown above, encoding is usually shown as enc and decoding is usually shown as dec. In our poly-commit schemes, we won't actually be using the decoding function at all, so we don't have to care about efficient decoders!

Rate & Relative Distance

Given the $[n, k, Δ]$ code, we define:

rate as $k / n$
relative distance as $Δ/ n$

We want both of these to be as high as possible, but generally there is a trade-off between them.

Linear Code

For linear codes, the condition is that "any linear combination of codewords is also a codeword". The results of this condition are:

the encoding can always be represented as a vector-matrix multiplication between the message and the generator matrix
minimum distance $Δ$ is the same as the codeword with the minimum number of non-zeros (weight)

Reed-Solomon Code

Reed-Solomon is a widely used error-correcting code.

the message is viewed as a unique degree $k - 1$ univariate polynomial
the codeword is the evaluation of this polynomial at $n$ publicly known points
- for example, $(ω, ω^{2}, \dots, ω^{n})$ for $n$ -th root of unity $ω^{n} \equiv 1 (mod p)$
distance is $Δ = n - k + 1$ which is very good
- this is because a degree $k - 1$ polynomial has at most $k - 1$ roots
- since the codeword is $n$ evaluations, we subtract the number of roots from this to get the minimum number of non-zeros
encoding time is $O (n lo g n)$ using the FFT (Fast-Fourier Transform)

For $n = 2 k$ , the rate is $1/2$ and relative distance is $1/2$ which turns out to be the best you can get!

Polynomial as a 2D Matrix

To begin constructing our poly-commit scheme, we will first take a different approach on representing our polynomial. Remember that there was a "coefficient representation" where we simply stored the list of coefficients as a vector. Now, we will use a matrix to do that.

Suppose you have a polynomial of degree $d$ where $d$ is a perfect square:

$f (x) = i = 1 \sum d j = 1 \sum d f_{i, j} u^{i - 1 + (j - 1) d}$

The coefficients of this polynomial can be represented by the following matrix:

$f_{1, 1} f_{2, 1} ⋮ f_{d, 1} f_{1, 2} f_{2, 2} ⋮ f_{d, 2} \dots \dots ⋱ \dots f_{1, d} f_{2, d} ⋮ f_{d, d}$

Evaluation of this polynomial at some point $u$ can then be shown as some matrix-vector multiplication:

$f (u) = [1, u, u^{2}, \dots, u^{d - 1}] \times f_{1, 1} f_{2, 1} ⋮ f_{d, 1} f_{1, 2} f_{2, 2} ⋮ f_{d, 2} \dots \dots ⋱ \dots f_{1, d} f_{2, d} ⋮ f_{d, d} \times 1 u^{d} u^{2 d} ⋮ u^{d - d}$

With this, we will be able to reduce a polynomial commitment of proof size $d$ to an argument for vector-matrix product into as shown below:

$[1, u, u^{2}, \dots, u^{d - 1}] \times f_{1, 1} f_{2, 1} ⋮ f_{d, 1} f_{1, 2} f_{2, 2} ⋮ f_{d, 2} \dots \dots ⋱ \dots f_{1, d} f_{2, d} ⋮ f_{d, d} = d [\dots]$

The prover could send this resulting vector to the verifier, and the verifier could continue with the next multiplication (shown above) which is a column vector made of $u$ which the user knows. This way, a size $d$ commitment is used to commit to a polynomial of degree $d$ .

The verifier also knows the vector on the left side, as it is also made of $u$ . The problem here is to somehow convince the verifier that the prover has used the correct coefficients in the 2D matrix. For this, the prover does the following:

Transform the $d \times d$ matrix into a $d \times n$ matrix, where each row of length $d$ is encoded into a codeword of length $n$ using a linear code.
Then, the resulting $d \times n$ matrix is committed using a Merkle Tree, where each column is a leaf.
The public parameter is just made of the decided Hash function to be used in Merkle Tree, there is no trusted setup required!

So in summary, rows → codewords, and then columns → Merkle Tree leaves. The Merkle Root of this tree becomes the commitment.

With that said, the entire algorithm can be split into two steps:

Proximity Test/Check: Test if the committed matrix indeed consists of $d$ codewords, encoded from the original rows.
1. The verifier can learn the number of columns by looking at the path in Merkle Tree
2. but it can't know if the rows are indeed codewords that belong to the original matrix rows
Consistency Test/Check: Test if the result of the vector-matrix multiplication is indeed what is claimed to be by the prover.

We will go into detail of each step.

Step 1: Proximity Test

For the proximity test, the Verifier sends a random vector $r = [r_{1}, r_{2}, \dots, r_{d}]$ (size $d$ ). Then, the prover multiplies the vector with the matrix (size $d \times n$ ) to obtain another vector of size $n$ . Afterwards, the verifier asks to reveal several columns of this matrix, and the prover reveals them.

With that, the verifier checks the following:

The resulting vector is a codeword, which should be true because any linear combination of codewords is a codeword.
Columns are as committed in the Merkle Tree.
Inner product between $r$ and each column is consistent. This is done simply by looking at the corresponding elements in the size $n$ vector.

If all these are correct, then the proximity test is passed with overwhelming probability.

Soundness (Intuition)

So why is this secure? Let us tackle each point one by one:

If an adversarial prover tries to use a different matrix, by the linear property of codewords, the resulting vector will NOT be a codeword. The first check ensures this.
The second check ensures that columns are as committted.
By the first check, the prover has to use the correct matrix, but it can still send a different result vector (one that is still a codeword). In that case, due to the distance property, this new vector must have some elements that are different than the original vector. Reed-Solomon has distance 1/2, so the new vector is different at around half of the points! The second check ensured the columns to be correct, so the prover's fake vector will most likely fail the third check because at least half of the points are different!

Soundness (Formal)

Things are a bit more complex formally. A new parameter $e$ is introduced. For $e < Δ/4$ , if the committed matrix $C$ is $e$ -far from any codeword (meaning that the minimum distance of all rows to any codeword in the linear code is at least $e$ ), then:

$Pr [w = r^{T} C is e close to any codeword] \leq \frac{e + 1}{F}$

So, if $w = r^{T} C$ is $e$ -far from any codeword then finally:

$Pr [Check 3 passes for t random columns] \leq (1 - \frac{e}{n})^{t}$

Discovery

This test was discovered independently by the two papers:

Ligero [Ames-Hazai-Ishai-Venkitasubramaniam'17] called this the "Interleaved Test". They were using Reed-Solomon code.
[Bootle-Cerulli-Ghadafi-Groth-Hajiabadi-Jakobsen'17] introduced "Ideal Linear Commitment Model" and they also introduced a new encoding scheme called "Linear-time Encodable Code". This was the first SNARK with linear prover time!

Both of these constructions were targeted to general-purpose SNARKs!

Optimization

The prover can actually send a message $m$ instead of the size $n$ result vector, such that the encoding of $m$ is equal to the codeword that is the resulting vector! This is good because:

the message (size $d$ ) is smaller than the vector (size $n$ )
check 1 is implicitly passed, because the resulting vector is literally the encoding of $m$
furthermore, a very cool property is that the message is actually equal to $r$ multiplied by the original coefficient matrix!

Step 2: Consistency Test

The algorithm for consistency test is almost the same as the optimized proximity test. The prover sends a message $m$ , which is the multiplication of $u$ (that is $f (u)$ ) with the coefficient matrix $C$ . Then, the verifier finds the encoding of this message.

Columns are ensured to be the committed ones, because we have already made that check in the proximity test. Furthermore, using the same randomly picked columns (corresponding to elements in the codeword) the verifier will check whether the multiplication is consistent.

In short:

The resulting vector is a codeword, true because vector was created from the encoding of $m$
Columns are as committed in the Merkle Tree, true because this was done in the previous test.
Inner product between $u$ and each column is consistent, which is checked using the same randomly picked columns (for efficiency).

Soundness (Intuition)

By the proximity test, the committed matrix $C$ is close to a codeword. Furthermore, there exists an efficient extractor that extracts $F$ by Merkle Tree commitment, and then decoding that to find $C$ such that $u \times F = m$ with overwhelming probability.

Polynomial Commitment based on Linear Code

Let us now describe the polynomial commitment scheme that makes us of linear codes (with constant relative distance).

Keygen: Sample a hash function
- Hash functions are public, so this is a transparent setup!
- $O (1)$ complexity, yummy.
Commit: Encode the coefficient matrix of $f$ row-wise with a linear code, and commit to it using Merkle Tree
- Encoding takes $O (d lo g d)$ field operations using Reed-Solomon code, or $O (d)$ using linear code
- Merkle Tree commitment takes $O (d)$ hashes, but commitment size is $O (1)$
Eval & Verify: You are given $m = u \times F$ , encode $m$ and do the proximity & consistency tests, then find $f (u) = ⟨ m, u^{'} ⟩$
- Eval takes $O (d)$ field operations
- Can be made non-interactive using Fiat-Shamir

This method has proof size $O (d)$ and verifier time $O (d)$

In Practice

[Bootle-Chiesa-Groth'20] dives into tensor query IOP, and they generalize the method to multiple dimensions with proof size $O (d^{ϵ})$ for some constant $ϵ < 1$ .

Brakedown [Golovnev-Lee-Setty-Thaler-Wahby'21] using tensor query IOP, made some evaluations using linear code for some polynomial of degree $d = 2^{25}$

Commit time: 36s
Eval time: 3.2s
Proof size: 49MB (ouch…)
Verifier time: 0.7s

They have also show that you can prove knowledge soundness without an efficient decoder. This is huge because normally an extractor would use the decoder to do the extraction, which was a problem if decoder was not efficient.

[Bootle-Chiesa-Liu'21] reduces the proof size to $O (polylog (d))$ using proof composition of tensor IOP and PCP of proximity [Mie'09]

Orion [Xie-Zhang-Song'22] achieves a proof size of $O (lo g^{2} d)$ using proof composition of the code-switching technique [RonZewi-Rothblum'20]

Looking at SNARKs with linear prover time in order:

Paper	Proof Size	Methodology
[Bootle-Cerulli-Ghadafi-Groth-Hajiabadi-Jakobsen'17]	$O (d)$	Ideal Linear Model
[Bootle-Chiesa-Groth'20]	$O (d^{ϵ})$	Tensor IOP
[Bootle-Chiesa-Liu'21]	$O (polylog (d))$	Tensor IOP + PCP
[Golovnev-Lee-Setty-Thaler-Wahby'21]	$O (d^{ϵ})$	Polynomial Commitment
[Xie-Zhang-Song'22]	$O (lo g^{2} d)$	Code-switching Proof Composition

Background: Linear-time Encodable Code

A linear code was introduced for binary messages back then in [Spielman'96], and then generalized to finite field elements by [Druk-Ishai'14]. The construction uses what is called "expander graphs".

Expander Graphs

Below is an example Bipartite graph, a graph that can be split in two parts (A, B here) such that no vertex within that sub-graph are connected. Furthermore, in this example each vertex on the left side is connected to 2 nodes in the right side, and each vertex on right is connected to 3 nodes on left.

flowchart LR
	subgraph left
		a1(( ))
		a2(( ))
		a3(( ))
		a4(( ))
	end

	subgraph right
		b1(( ))
		b2(( ))
		b3(( ))
		b4(( ))
	  b5(( ))
		b6(( ))
	end

  a1 --- b1
  a3 --- b1
  a2 --- b2
  a3 --- b2
  a3 --- b3
  a4 --- b3
  a1 --- b4
  a2 --- b4
  a2 --- b5
  a3 --- b5
  a1 --- b6
  a4 --- b6

You can think of larger bipartite expander graphs. The trick of using an expander as a linear code is: let each vertex in the left correspond to symbols of the message $m$ , and let the right side correspond to symbols of the codeword!

Each symbol in the codeword is simply the sum of the connected symbols in message. This relationship can be easily captured as the multiplication of the message $m$ with the adjacency matrix of the expander graph.

That sounded really good, but sadly it is not sufficient; it fails on the "constant relative distance" requirement. Take a message with a single non-zero for example, the codeword must look the same for all such messages. Obviously, this is not the case here, because codewords symbols change depending on which symbol of the message is non-zero.

Lossless Expander Graph

Let $∣ L ∣$ be the number of vertices in the left graph, and set $∣ R ∣$ to be $α ∣ L ∣$ for some constant $α$ . In the example above, $α$ is larger than 1 because right has more nodes than left, but in practice we actually have $0 < α < 1$ . Let $g$ be the number of edges per node in the left side, e.g. $g = 3$ for the example above.

What is the maximum possible expansion of a subset $S$ in the left side? Well, it is simply $g ∣ S ∣$ . We let $Γ$ denote the set of neighbors for a set, i.e. $∣Γ (S) ∣ = g ∣ S ∣$ . However, this is not true for all subsets, you must have enough nodes on the right side for that, which can be defined by the constraint:

$∣ S ∣ \leq \frac{α ∣ L ∣}{g}$

Turns out this is too good to be true! So in practice, we use a more relaxed definition. We let the maximum expansion $∣Γ (S) ∣ \geq (1 - β) g ∣ S ∣$ with the constraint:

$∣ S ∣ \leq \frac{δ ∣ L ∣}{g}$

for some $δ$ . Note that the previous "too good to be true" definition uses $β = 0$ and $δ = α$ . The smaller $β$ you have the less-relaxed this thing is.

Recursive Encoding

Lossless expander itself is not enough, we will do the encoding recursively. In this case, we will start with a message $m$ of length $k = ∣ m ∣$ . We will obtain a codeword of size $4 k$ .

For now, assume that we already have an encoder with rate 1/4.

flowchart
	m[message len=k]
	e[code len=k/2]

	subgraph codeword len=4k
	mc[message len=k]
	c1[code1 len=2k]
	c2[code2 len=k]
	end

	m --copy---> mc
	m --lossless expand--> e
	e --encode--> c1
	c1 --lossless expand--> c2

As shown above, the codeword has 3 parts:

The message itself, length $k$ . This is a common approach in error correcting codes, and such codes with message being the start of the codeword are called "systematic code".
Then, the message will be encoded using a lossless expander with $α = 1/2$ . The resulting code has size $k /2$ . This result is then encoded using an existing (assumed) encoder of rate $1/4$ . The resulting codeword has length $4 \times (k /2) = 2 k$ . Denote this as $c_{1}$ , this guy will be the middle part of our actual codeword.
Finally, use a lossless expander with $α = 1/2$ to encode $c_{1}$ and obtain $c_{2}$ of length $k$ . This is the final part of the codeword.

$codeword = m ∣∣ c_{1} ∣∣ c_{2}$

Now, about that "assumed" encoder, how do we implement it? Well, notice that the input to that encoder is of length $k /2$ . We can actually use this entire algorithm as that encoder, this time the message being of length $k /2$ instead of $k$ . This is why the name "recursive" is used. Once you get to a certain constant size, just use any code with good distance (e.g. Reed-Solomon) to do the encoding job.

Also note that we use two lossless expanders with $α = 1/2$ , but they are not the same! This is because their input sizes ( $∣ L ∣$ ) are different.

Sampling the Lossless Expander

As we can see, we need lossless expanders for these recursions, so we must be able to sample them efficiently. Are there any methods to do so?

[Capalbo-Reingold-Vadhan-Widgerson'02] shows an explicit construction of lossless expander, but they require a large hidden constant that is hard to find in the first place. Alternatively, one can argue that sampling a random graph has a $1/ poly (n)$ probability of failing to find a lossless expander.

Improvements

Brakedown [Golovnev-Lee-Setty-Thaler-Wahby'21] assigns random weights to the edges in the graph, and they show that the resulting random summations leads to better distance metrics.

Orion [Xie-Zhang-Song'22] shows a way to do the lossless expander testing with a negligible probability (instead of $1/ poly (n)$ ) which is awesome, because you can then do rejection sampling to efficiently find a good lossless expander! They do this by looking at the maximum density in a graph.

Summary

Polynomial commitment (and SNARK) based on linear code has the following properties:

🟢 Transparent setup, $O (1)$
🟢 Commit and Prover times are $O (d)$ field operations
🟢 Plausibly post-quantum secure
🟢 Field agnostic
🔴 Proof size $O (d)$ , order of MBs in practice

That is the end of this lecture!

ZKHACK Whiteboard Sessions

ZKHACK Whiteboard Sessions are a series of YouTube videos, where various topics on ZK is described by their experts. I have collected some of my notes taken from these videos here.

What is a SNARK?: In module 1, we learn about the initial set of building blocks in zero knowledge - a SNARK and how different proving systems work. We will cover what a SNARK is, how they are used and how they are built. The module is by the one-and-only, Prof. Dan Boneh.
Building a SNARK: In modules 2 and 3, we learn how to build an efficient zk-SNARK for general circuits. We will review one particular paradigm to build a SNARK by reviewing the two components that combine to make it up: a functional commitment scheme and a compatible Then, we construct a polynomial IOP by displaying algebraic ideas, such as PLONK. These modules are also by Prof. Dan Boneh.
Custom Gates & Lookups: We cover modules 5 and 6 here. First, Adrian Hamelink from Aztec introduces the concepts that make up the Plonk proving system as well as custom gates and other techniques which are commonly used to accelerate Plonk-based SNARKs. Using a toy circuit example, he comes up with a simple custom gate for decomposing large numbers into bits. We then explore how lookup tables make it easier to evaluate functions that were not designed for use within a circuit, as well as efficiently range-checking field elements. In the other module, Mary Maller, a ZK researcher from the Ethereum Foundation, dives into lookup arguments and other common techniques to make provers faster and less expensive. She reviews vector commitments in the standard and lagrange basis, the halo2 lookup argument, and wraps up the module by presenting her current team's work on the Caulk lookup argument.
zkEVM & zkID: We cover modules 10 and 12 here. First, Jordi Baylina breaks down the topic of zkEVMs by walking us through the building of low-level small circuits and demonstrating how these are then used to build zkEVMs. He goes on to clarify misconceptions around zkEVMs and reviews the logic behind how we build state machines. Then, Oleksandr Brezhniev of Polygon ID and host Bobbin Threadbare discuss the types of ID from physical to digital, centralized to self-sovereign and how their work on Polygon's zkID aims to build a system for decentralized identity. Oleksandr explains how the use of blockchain technology and ZK proofs can be used as a form of identity verification.

What is a SNARK?

A SNARK stands for a succinct proof that a certain statement is true. Succinct here is meaning that the proof is "short". For example, I have a statement:

I know an $m$ such that $SHA256 (m) = 0$ .

In SNARK, the proof should be short and fast to verify. A trivial proof of the above statement is to simply send $m$ to the verifier. However, that proof is not short; it is as big as $m$ . Verification is also not fast, as the verifier has to hash the entire message to actually see the proof.

A SNARK can have a proof size of few KBs and verification should take at most seconds.

zk-SNARK

In the case of a zk-SNARK, the proof reveals nothing about $m$ . zk-SNARKs have many applications:

Private transactions: Tornado Cash, ZCash, IronFish, Aleo (private dApps).
Compliance: private proofs of solvency & compliance, zero-knowledge taxes
Scalability: Rollup systems with validity proofs

To understand zk-SNARKs, we need quite a bit of cryptography:

Arithmetic Circuits
Argument Systems

Arithmetic Circuits

Fix a finite field $F = {0, \dots, p - 1}$ for some prime $p > 2$ . A finite field is just a set of numbers where we can do addition and multiplication in modulo $p$ .

An arithmetic circuit is a DAG (directed acyclic graph) $C : F^{n} \to F$ where internal nodes are labeled $+, -, \times$ and inputs are labeled $1, x_{1}, \dots, x_{n}$ . The circuit defines an $n$ -variate polynomial with an evaluation recipe.

Here is an example:

flowchart LR
	1((1)) --> -
	x2((x2)) --> -
	1((1)) --> +
	x1((x1)) --> +
	x2((x2)) --> +
	+ --> x
	- --> x
	x1 --> x
	x --> r(( ))

This circuit defines the operation $x_{1} (x_{1} + x_{2} + 1) (x_{2} - 1)$ .

For convenience, the size of the circuit refers to the number of gates, and is denoted as $∣ C ∣$ . In the example above, $∣ C ∣ = 3$ .

A theorem states that all polynomial time algorithms can be captured by polynomial sized arithmetic circuits!

For example:

You can implement a circuit that does $C_{hash} (h, m) = (h - SHA256 (m))$ . This outputs 0 if $m$ is the preimage of $h$ using SHA256, and something other than 0 otherwise. This circuit uses around 20K gates, which is not bad!
You can have a $C_{sig} (p k, m, σ)$ that outputs 0 if $σ$ is a valid ECDSA signature on $m$ with respect to $p k$ .

Argument Systems

Consider a public arithmetic circuit $C (x, w) \to F$ where

$x$ is a public statement in $F^{n}$
$w$ is a secret witness in $F^{m}$

There will be a Prover with access to $x, w$ and a Verifier with access to $x$ . Prover's goal is to convince a Verifier that $\exists w$ s.t. $C (x, w) = 0$ .

sequenceDiagram
	actor P as Prover
	actor V as Verifier
	note over P: knows x, w
	note over V: knows x
	loop interactions
	P ->> V: send commitments and stuff
	V ->> P: send queries and stuff
	end
	note over V: accept or reject

The above process is interactive, prover and verifier interact with each other.

We also have non-interactive preprocessing argument systems. In this case, there is a preprocessing (setup) phase $S (C) \to (S_{p}, S_{v})$ that generate two public parameters, one for prover and one for the verifier.

sequenceDiagram
	actor P as Prover P(S_p, x, w)
	actor V as Verifier V(S_v, x)
	P ->> V: proof π
	note over V: accept or reject

As we can see, this is non-interactive; Verifier does not talk back to Prover!

More formally, a preprocessing argument system is a triple $(S, P, V)$ :

$S (C)$ takes an arithmetic circuit $C$ and outputs $(S_{p}, S_{v})$ public parameters for the prover and verifier respectively.
$P (S_{p}, x, w)$ outputs a proof $π$ .
$V (S_{v}, x, π)$ accepts or rejects a given proof.

An argument system must formally have the following properties:

Completeness: $\forall x, w : C (x, w) = 0$ , it must hold that $Pr [V (S_{v}, x, P (S_{p}, x, w)) = accept] = 1$ for the honest provers.
Knowledge Soundness: If the Verifier accepts the proof by a Prover, then the Prover must definitely know some $w$ such that $C (x, w) = 0$ . Furthermore, a Prover that does not know any such $w$ can only provide a proof that a Verifier can accept with at most negligible probability.
Zero-Knowledge: An extra property is that $(C, S_{p}, S_{v}, x, π)$ should reveal nothing about $w$ .

For a preprocessing argument system to be succinct, it needs to have the following to constraints:

$∣ π ∣ = O (lo g (∣ C ∣), λ)$ meaning that length of the proof can only be logarithmic in the size of the circuit (number of gates). It can be linear in the security parameter $λ$ too.
$time (V) = O (∣ x ∣, lo g (∣ C ∣), λ)$ meaning that the time to verify should be logarithmic in the size of circuit, and linear with the size of the statement.
$λ$ here is the security parameter (e.g. 128 for 128-bit security). It is mostly omitted from the complexity notation, or something like $O_{λ} (lo g (∣ C ∣))$ is used.

Note that with these constraints, the verifier does not have enough time to read $C$ itself, as it can't be done in time $lo g (∣ C ∣)$ .

So in short, a zk-SNARK has all 4 properties above: Complete, Knowledge Sound, Zero-Knowledge, Succinct. We can go a bit more formal for the knowledge-soundness and zero-knowledge properties.

Knowledge Soundness

Formally, for an argument system $(S, P, V)$ is knowledge-sound for some circuit $C$ , if for every polynomial time adversary $A = (A_{0}, A_{1})$ such that:

$S (C) \to (S_{p}, S_{v})$
$(x, s t a t e) \leftarrow A_{0} (S_{p})$
$π \leftarrow A_{1} (S_{p}, x, s t a t e)$
$Pr [V (S_{v}, x, π) = accept] > β$ for some non-negligible $β$

there is an efficient extractor $E$ that uses $A_{1}$ as a black box (oracle) such that:

$S (C) \to (S_{p}, S_{v})$
$(x, s t a t e) \leftarrow A_{0} (S_{p})$
$w \leftarrow E^{A_{1} (S_{p}, x, s t a t e)} (S_{p}, x)$
$Pr [C (x, w) = 0] > β - ϵ$ for some negligible $ϵ$ .

In other words, the probability that you can convince the verifier for some witness $w$ must be at most negligibly different than the probability that this witness $w$ is a valid witness for the circuit $C$ .

Zero-Knowledge

Formally (simplified), for an argument system $(S, P, V)$ is zero-knowledge if for every statement $x \in F^{n}$ the proof $π$ reveals nothing about $w$ , other than its existence. By that, we mean that the Verifier is capable of generating the same proof $π$ without the knowledge of $w$ . Formally, there must exist an efficient simulator $S im$ such that $\forall x \in F^{n}$ s.t. $\exists w : C (x, w) = 0$ the distribution:

$(C, S_{p}, S_{v}, x, π) :$ where $(S_{p}, S_{v}) \leftarrow S (C), π \leftarrow P (S_{p}, x, w)$

is indistinguishable from the distribution:

$(C, S_{p}, S_{v}, x, π) :$ where $(S_{p}, S_{v}, π) \leftarrow S im (C, x)$

Types of Preprocessing Setup

We said that a preprocessing setup $S (C)$ is done for a circuit $C$ . Things are actually a bit more detailed than this, there are 3 types of setups:

Trusted Setup per Circuit: $S (C; r)$ is a randomized algorithm. The random $r$ is calculated per circuit, and must be kept secret from the prover; if a prover can learn $r$ than they can prove false statements!
Trusted Setup & Universal (Updatable): a random $r$ is only chosen once, and the setup phase is split in two parts: $S = (S_{ini t}, S_{in d e x})$ .
1. $S_{ini t} (λ; r) \to pp$ is done a single time.
2. $S_{in d e x} (pp, C) \to (S_{p}, S_{v})$ is done for each circuit, and nothing here is secret!
Transparent: $S (C)$ does not use any secret data, meaning that a trusted setup is not required.

These setups are sorted in ascending order with respect to how good they are, so Transparent is kind of the best.

A SNARK Software System

flowchart LR
	D[DSL] -- compiler --> S[SNARK-friendly format]
	S -- Sp, Sv --> B[SNARK backend prover]
	X[x, witness] --> B
	B --> proof

A SNARK software system has the above format:

A Domain-Specific Language is used to write the circuit, there are lots of languages (Circom, ZoKrates, Leo, Zinc, Cairo, Noir, …) and there is even a framework called CirC that can help you write your own DSL.
The SNARK-friendly format also has options, such as R1CS, AIR, Plonk-CG, PIL, …
A backend will run the heavy computation of generating the proof. Note that this is in time linear of $∣ C ∣$ for a circuit $C$ .
Finally, a generated proof!

Building a SNARK

There are various paradigms on building SNARKs, but the general paradigm is two step:

A functional commitment scheme, where most of cryptography takes place
A suitable interactive oracle proof (IOP), where most of the information theory takes place

Functional Commitment Scheme

Well, first, what is a commitment scheme? A cryptographic commitment is like a physical-world envelope. For instance, Bob can put a data in an envelope, and when Alice receives this envelope she can be sure that Bob has committed to whatever value is in it. Alice can later reveal that value.

The commitment scheme has two algorithms:

$co mmi t (m, r) \to co m$ for some randomly chosen $r$
$v er i f y (m, co m, r) \to accept or reject$

The scheme must have the following properties:

Binding: cannot produce two valid openings for $co m$
Hiding: $co m$ reveals nothing about the committed data

There is a standard construction using hash functions. Fix a hash function $H : M \times R \to C$ where

$co mmi t (m, r) = H (m, r)$
$v er i f y (m, co m, r) = accept if co m = H (m, r)$

Committing to a function

Choose a family of functions $F = {f : X \to Y}$ . What does it really mean to commit to a function? Well, consider the following interaction:

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: f ∈ F
	P ->> V: com_f := commit(f, r)
	V ->> P: x ∈ X
	P ->> V: y ∈ Y, proof π
	note over V: Accept or Reject

Here, the proof $π$ is to show that $f (x) = y$ and $f \in F$ .

More formally, a functional commitment scheme for $F$ :

$se t u p (λ) \to pp$ is public parameters $pp$
$co mmi t (pp, f, r) \to co m_{f}$ is commitment to $f \in F$ with $r \in R$
- this should be a binding scheme
- optionally, it can be hiding, which is good for zk-SNARK
$e v a l (P, V)$ with a prover $P$ and verifier $V$ , where for a given $co m_{f}$ and $x \in X, y \in Y$ :
- $P (pp, f, x, y, r) \to short proof π$ , which is a SNARK for the relation: $f (x) = y$ and $f \in F$ and $co mmi t (pp, f, r) = co m_{f}$ .
- $V (pp, co m_{f}, x, y, π) \to accept or reject$
- Basically, the $e v a l$ system is a SNARK

Function Families

There are 3 very important functional commitment types:

Polynomial Commitments: Committing to a univariate polynomial $f (X) \in F_{p}^{(\leq d)} [X]$ where that fancy notation stands for the set of all univariate polynomials of degree at most $d$ .
Multilinear Commitments: Committing to a multilinear polynomial in $F_{p}^{(\leq 1)} [X_{1}, \dots, X_{k}]$ which is the set of all the multilinear polynomials in at most $k$ variables.
- A multilinear polynomial is when all the variables have degree at most 1. Here is an example: $f (x_{1}, \dots, x_{7}) = x_{1} x_{3} + x_{1} x_{4} x_{5} + x_{7}$ .
Linear Commitments: Committing to a linear function $f_{v} (u) = ⟨ u, v ⟩ = \sum_{i = 1}^{n} u_{i} v_{i}$ which is just the dot product of two vectors.

Different SNARK systems may use different commitments. Note that linear commitments can be transformed into multilinear commitments, and those can be transformed into polynomial commitments. A good exercise!

From here on we will talk about Polynomial Commitments.

Polynomial Commitment Scheme (PCS)

A PCS is a functional commitment for the family $F = F_{p}^{(\leq d)} [X]$ . The prover commits to a univariate polynomial $f \in F_{p}^{(\leq d)} [X]$ , later, they can prove that $v = f (u)$ for some public $u, v \in F_{p}$ . As this is a SNARK, the proof size and verifier time should be $O_{λ} (lo g d)$ .

Using basic elliptic curves: Bulletproofs
Using bilinear groups: KZG (trusted setup) (2010), Dory (transparent) (2020)
Using groups of unknown order: Dark (20)
Using hash functions only: based on FRI

We will focus on KZG, as it is much simpler and commonly used.

KZG PCS

The name stands for Kate-Zaverucha-Goldberg. It operates on a cyclic group $G := {0, G, 2 G, 3 G, \dots, (p - 1) G}$ of order $p$ where $G$ is the generator. Note that in such a setting, $pG = 0$ .

Setup

The setup phase $se t u p (λ) \to pp$ works as follows:

Sample random $α \in F_{p}$
$pp = (H_{0} = G, H_{1} = α G, H_{2} = α^{2} G, \dots, H_{d} = α^{d} G) \in G^{d + 1}$
Delete $α$ or you get in trouble! (trusted setup)

Note that you can't do something like $H_{0} / H_{1} = α$ because division is not defined for these guys!

Commitment

The commitment phase $co mmi t (pp, f) \to co m_{f}$ is as follows:

Compute $co m_{f} := f (α) G \in G$ , but wait we don't have $α$ so what do we do?
We have a univariate polynomial $f (X) = f_{0} + f_{1} X + \dots + f_{d} X^{d}$
We use the public parameters $pp$ to compute $co m_{f} = f_{0} H_{0} + f_{1} H_{1} + \dots + f_{d} H_{d}$
If you expand $H$ in there, you will notice that the entire thing is equal to $f (α) G$ .
This is a binding commitment, but it is not hiding for now.

Evaluation

We have $co mmi t (pp, f) \to co m_{f}$ where $co m_{f} = f (α) G \in G$ at this point. The evaluation phase $e v a l (P, V)$ will work as follows:

Prover $P (pp, f, u, v)$ has the goal of proving $f (u) = v$ and generate some proof $π$ .
Verifier $V (pp, co m_{f}, u, v, π)$ will verify that proof.

The trick is to see that if $f (u) = v$ then $u$ is a root of $\hat{f} = f - v$ . It is because $\hat{f} (u) = f (u) - v = v - v = 0$ . There is a very well known result of this, that $(X - u)$ divides $\hat{f}$ . This also means that there is quotient polynomial $\exists q \in F_{p} [X]$ such that $q (x) (X - u) = f (X) - v$ .

With this, the Prover will calculate the quotient polynomial $q (X)$ and will commit to it to find $co m_{q}$ . This will be the proof $π = co m_{q} \in G$ . The verifier will accept the proof $π$ only if $(α - u) co m_{q} = co m_{f} - v G$ .

Note that verifier is using $α$ here, even though it was secret. The truth is, it is not actually using $α$ but instead uses a pairing, and the only thing verifier needs to know for that is $H_{0} = G$ and $H_{1} = α G$ , which are part of public parameters $pp$ .

Computing $q (x)$ is pretty expensive for large $d$ , and this part takes most of the computational time of the entire algorithm.

Remarks

KZG has been generalized for committing to $k$ -variate polynomials (PST 2013). There are many more generalizations of it. KZG also provides batch proofs, where a prover can prove a bunch of commitments in a single computation.

The difficulty with KZG is that it requires a trusted setup for $pp$ , and $pp$ size is linear in $d$ .

Polynomial Interactive Oracle Proof

Now we are at the second part of a SNARK. Let $C (x, w)$ be some arithmetic circuit. Let $x \in F_{p}^{n}$ . Poly-IOP is a proof system that proves $\exists w : C (x, w) = 0$ as follows:

We preprocess the circuit $C$ as similar to previous steps. $se t u p (C) \to (S_{p}, S_{v})$ where $S_{v} = (co m_{f_{0}}, co m_{f_{- 1}}, \dots, co m_{f_{- s}})$ .
An interactive proof is played as shown below:

sequenceDiagram
	actor P as Prover P(S_p, x, w)
	actor V as Verifier V(S_v, x)

	loop i = 1, 2, ..., (t-1)
	P ->> V: f_i ∈ F
	note over V: r_i ← F_p
	V ->> P: r_i
	end

	P ->> V: f_t ∈ F
	note over V: verify^(f_{-s}, ..., f_t)(x, r_1, ..., r_(t-1))

Here the $v er i f y$ in the end is just an efficient function that can evaluate $f_{i}$ at any point in $F_{p}$ . Also note that $r_{i}$ is randomly chosen only after the prover has committed to $f_{i}$ .

The Verifier here is just creating random numbers. Using Fiat-Shamir transform, such an interactive proof can be made non-interactive!

As usual, we expect the following properties:

Complete: if $\exists w : C (x, w) = 0$ then verifier always accepts.
Knowledge Sound: Let $x \in F_{p}^{n}$ . For every $P *$ that convinces the verifier with some non-negligible probability $β$ , there is an efficient extractor $E$ such that for some negligible $ϵ$ :

$Pr [E (x, f_{1}, r_{1}, \dots, f_{t - 1}, r_{t - 1}, f_{t}) \to w : C (x, w) = 0] \geq β - ϵ$

Zero-Knowledge: This is optional, but will be required for a zk-SNARK.

The Resulting SNARK

The resulting SNARK will look the following: there will be $t$ number of polynomials committed, and $q$ number of evaluation queries (points) for the verification. This is parameterized as $(t, q)$ POLY-IOP.

For the SNARK:

Prover send $t$ polynomial commitments
During Poly-IOP verify, the PCS $e v a l$ is run $q$ times.
Note that $e v a l$ is made non-interactive via Fiat-Shamir transform.

The length of the SNARK proof is $t$ polynomial commitments + $q$ evaluation proofs.

Prover Time: $t \times time (co mmi t) + q \times time (p ro v e) + time (I O P_{p ro v e})$
Verifier Time: $q \times time (e v a l_{v er i f y}) + time (I O P_{v er i f y})$

Usually, both $t, q \leq 3$ so these times are really short, in fact, in constant time w.r.t the polynomial count which is awesome.

We will build a Poly-IOP called Plonk. Plonk + PCS will make up a SNARK, and optionally can be extended to a zk-SNARK.

Key Observations

We can do zero test and equality test on committed polynomials, and these are used on almost all SNARK constructions.

Zero Test

A really key fact for $0 \neq = f \in F_{p}^{(\leq d)} [X]$ (for some random non-zero polynomial with degree at most $d$ ) is as follows:

for $r \leftarrow F_{p}$ it holds that $Pr [f (r) = 0] \leq d / p$

We know that $f$ has at most $d$ roots. $r$ is chosen at random from a size $p$ , do the probability that $r$ "hits" a root value is easy to see that $d / p$ .

Suppose that $p \approx 2^{256}$ and $d \leq 2^{40}$ . Then, $d / p$ is negligible! So it is really unlikely that a randomly chosen field element will be the root for $f$ .

With this in mind, if you do get $f (r) = 0$ for $r \leftarrow F_{p}$ then $f$ is identically zero with very high probability. This gives you a simple zero test for a committed polynomial!

This condition holds even for multi-variate polynomials!

Equality Test

Here is a related observation from the Zero Test.

Let $f, g \in F_{p}^{(\leq d)} [X]$ . For $r \leftarrow F_{p}$ , if $f (r) = g (r)$ then $f = g$ with very high probability! This comes from the observation above (think of $f - r = 0$ ).

Useful Proof Gadgets

Let $w \in F_{p}$ be a primitive $k$ -th root of unity (meaning that $w^{k} = 1)$ . Set $H := {1, ω, ω^{2}, \dots, ω^{k - 1}} \subseteq F_{p}$ . Let $f \in F_{p}^{(\leq d)} [X]$ and $b, c \in F_{p}$ where $d \geq k$ .

There are efficient poly-IOPs for the following tasks:

Zero Test: Prove that $f$ is identically zero on $H$
Sum Check: Prove that $\sum_{a \in H} f (a) = b$ where the verifier has $co m_{f}, b$ .
Product Check: Prove that $\prod_{a \in H} f (a) = c$ where the verifier has $co m_{f}, c$ .

We will only look at zero test.

Zero Test on $H$

There is a cute lemma that will be used here: $f$ is zero on $H$ if and only if $f (X)$ is divisible by $X^{k} - 1$ .

sequenceDiagram
	actor P as Prover
	actor V as Verifier

	note over P: q(X) ← f(X) / (X^k - 1)
  	P ->> V: q
	note over V: r ← F_p
	V ->> P: r
	note over P: evaluate q(r) and f(r)
	P ->> V: q(r), f(r)
	note over V: accept if f(r) = q(r) * (r^k - 1)

This protocol is complete and sound, assuming $d / p$ is negligible.

PLONK

PLONK is a Poly-IOP for a general circuit $C (x, w)$ .

Step 1: Compile Circuit to a Computation Circuit

Consider the following circuit (gate fan-in: 2, meaning that gates can take 2 inputs):

flowchart LR
	x1((x1)) --> +1[+ g0]
	x2((x2)) --> +1
	x2 --> +2[+ g1]
	w1((w1)) --> +2
	+1 --> x[x g2]
	+2 --> x
	x --> r(( ))

There are 3 gates here (namely $g_{0}, g_{1}, g_{2}$ ), 2 statement inputs $x_{1}, x_{2}$ and a witness input $w_{1}$ . The circuit outputs $(x_{1} + x_{2}) (x_{2} + w_{1})$ . Consider giving $x_{1} = 5, x_{2} = 6, w_{1} = 1$ . Here is what that computation would look like:

flowchart LR
	x1((x1)) -- 5 --> +1[+ g0]
	x2((x2)) -- 6 --> +1
	x2 -- 6 --> +2[+ g1]
	w1((w1)) -- 1 --> +2
	+1 -- 11 --> x[x g2]
	+2 -- 7 --> x
	x -- 77 --> r((77))

We would like to obtain a computation trace of this circuit evaluation. A computation trace is simply a table that shows the inputs, and the state of each gate (input1, input2, output). The output of the circuit is the output of the last gate in the circuit. Here is the computation trace for the circuit evaluation above:


Inputs	5	6	1
Gate 0	5	6	11
Gate 1	6	1	7
Gate 2	11	7	77

At this point, we can forget about the circuit and focus on proving that a computation trace is valid. Note that input count does not have to be equal to number of inputs & output of a gate.

Step 1.5: Encoding the Trace as a Polynomial

First, some notation:

$∣ C ∣$ is the total number of gates in circuit $C$
$∣ I ∣$ is equal to $∣ I_{x} ∣ + ∣ I_{w} ∣$ which is total number of inputs to $C$
Let $d = 3∣ C ∣ + ∣ I ∣$ which gives us the number of entries in the computation trace. In the example, that is $d = 12$ .
$H = {1, ω, ω^{2}, \dots, ω^{d - 1}}$

The plan: the prover will interpolate a polynomial $P \in F_{p}^{(\leq d)} [X]$ that encodes the entire trace, such that:

$P$ encodes all inputs: $P (ω^{- j}) = input #j$ for $j = 1, \dots, ∣ I ∣$ .
$P$ encodes all wires: $\forall l = 0, \dots, ∣ C ∣ - 1$ :
1. $P (ω^{3 l})$ gives the left input to gate # $l$
2. $P (ω^{3 l + 1})$ gives the right input to gate # $l$
3. $P (ω^{3 l + 2})$ gives the output of gate # $l$

Prover uses FFT (Fast Fourier Transform) to compute the coefficients of $P$ in time $d lo g_{2} d$ , almost in linear time!

Step 2: Proving validity of $P$

Prover needs to prove that $P$ is a correct computation trace, which means the following:

$P$ encodes the correct inputs
Every gate is evaluated correctly
The "wiring" is implemented correctly
The output of last gate is 0

(1) $P$ encodes the correct inputs

Remember both prover and verifier has the statement $x$ . They will interpolate a polynomial $v (X) \in F_{p}^{(\leq ∣ I_{x} ∣} [X]$ that encodes the $x$ -inputs to the circuit:

for $j = 1, \dots, ∣ I_{x} ∣ : v (ω^{- j}) = input #j$

Constructing $v (X)$ takes time proportional to the size of input $x$ .

In our example: $v (ω^{- 1}) = 5, v (ω^{- 2}) = 6, v (ω^{- 3}) = 1$ so $v$ is quadratic.

Next, they will agree on the points encoding the input:

$H_{in p} := {ω^{- 1}, ω^{- 2}, \dots, ω^{- ∣ I_{x} ∣}}$

Prover will prove (1) by using a zero-test on $H_{in p}$ to prove that:

$P (y) - v (y) = 0$ for all $y \in H_{in p}$

(2): Every gate is evaluated correctly

The idea here is to encode gate types using a selector polynomial $S (X)$ . Remember that in our example we encoded the two gate inputs and an output as $ω$ to the power $3 l, 3 l + 1, 3 l + 2$ for some gate $l$ . Now, we will encode the "types" of these gates.

Define $S (X) \in F_{p}^{(\leq d)} [X]$ such that $\forall l = 0, \dots, ∣ C ∣ - 1$ :

$S (ω^{3 l}) = 1$ if gate $# l$ is an addition gate +
$S (ω^{3 l}) = 0$ if gate $# l$ is a multiplication gate x

In our example, $S (ω^{0}) = 1, S (ω^{3}) = 1, S (ω^{6}) = 0$ , so $S$ is a quadratic polynomial.

Now, we make a really nice observation: $\forall y \in H_{g a t es} := {1, ω^{3}, ω^{6}, \dots, ω^{3 (∣ C ∣ - 1)}}$ it should hold that:

$S (y) \times (P (y) + P (ω y)) + (1 - S (y)) (P (y) \times P (ω y)) = P (ω^{2} y)$

Here, $P (y), P (ω y), P (ω^{2} y)$ are left input, right input and output respectively.

Prover will use a zero-test on the set $H_{g a t es}$ to prove that $\forall y \in H_{g a t es}$ :

$S (y) \times (P (y) + P (ω y)) + (1 - S (y)) (P (y) \times P (ω y)) - P (ω^{2} y) = 0$

(3) The wiring is correct

For that, the wires of $C$ are encoded with respect to their constraints. In our example:

$P (ω^{- 2}) = P (ω^{1}) = P (ω^{3})$
$P (ω^{- 1}) = P (ω^{0})$
$P (ω^{2}) = P (ω^{6})$
$P (ω^{- 3}) = P (ω^{4})$

Define a polynomial $W : H \to H$ that implements a rotation:

$W (ω^{- 2}, ω^{1}, ω^{3}) = (ω^{1}, ω^{3}, ω^{- 2})$
$W (ω^{0}) = (ω^{- 1})$
$W (ω^{6}) = (ω^{2})$
$W (ω^{4}) = (ω^{- 3})$

Why we do this fantastic thing is due to a lemma; if $\forall y \in H : P (y) = P (W (y))$ then the wire constraints are satisfied.

However, there is a problem: $P (W (y))$ has degree $d \times d = d^{2}$ but we want prover to work in linear time $d$ only! PLONK uses a very nice trick: use product check proof to reduce this to a constraint of linear degree. This trick is called the "PLONK Permutation" trick.

(4) Output of last gate is 0

Proving the last one is easy, just show $P (3^{3∣ C ∣ - 1}) = 0$ .

Final PLONK Poly-IOP

In the setup phase:

$S e t u p (C) \to (S_{p}, S_{v})$ where
- $S_{p} := (S, W)$
- $S_{v} := (co m_{S}, co m_{W})$

The prover has $(S_{p}, x, w)$ and the verifier has $(S_{v}, x)$ .

The prover will build $P (X) \in F_{p}^{(\leq d)} [X]$ and give the commitment $co m_{P}$ to the verifier.
The verifier will then build $v (X) \in F_{p}^{(\leq ∣ I_{x} ∣)} [X]$

Finally, the prover will prove the four things described before:

Inputs

$\forall y \in H_{in p} : P (y) - v (y) = 0$

Gates

$\forall y \in H_{g a t es} : S (y) \times (P (y) + P (ω y)) + (1 - S (y)) (P (y) \times P (ω y)) - P (ω^{2} y) = 0$

Wires

$\forall y \in H : P (y) - P (W (y)) = 0$

Output

$P (3^{3∣ C ∣ - 1}) = 0$

There is a theorem that shows this PLONK Poly-IOP is knowledge sound and complete! The resulting proof is around 400 bytes, and verification takes around 6ms. PLONK can actually handle circuits with more general gates than + and x, although we only used those operations in our gate constraints. The resulting SNARK can be made into a zk-SNARK, though it is not covered here.

There is also something called PLOOKUP: efficient Poly-IOP for circuits with lookup tables, which is very very useful.

Custom Gates

Consider the circuit below:

flowchart LR
	x1((x1)) --> +1[+ gate:1]
	x2((x2)) --> +1
	x2 --> +2[+ gate:2]
	w1((w1)) --> +2
	+1 --> x[x gate:3]
	+2 --> x
	x --> r(( ))

For a gate number $g$ , denote the left input, right input, and the output as $a_{g}, b_{g}, c_{g}$ respectively. We would like the circuit to satisfy the following:

$a_{1} + b_{1} = c_{1} = a_{3}$
$a_{2} + b_{2} = c_{2} = b_{3}$
$a_{3} \times b_{3} = c_{3}$

You can write down these in a table:

	$a$ (left input)	$b$ (right input)	$c$ (output)	$s$ (selector)
Gate 1	$a_{1}$	$b_{1}$	$c_{1}$	1
Gate 2	$a_{2}$	$b_{2}$	$c_{2}$	1
Gate 3	$a_{3}$	$b_{3}$	$c_{3}$	0

Notice the selector input $s$ , where it is denoted as 1 for addition gates, and 0 for multiplication gates. With this, the entire circuit can be captured by the following equation:

$(s_{i}) (a_{i} + b_{i}) + (1 - s_{i}) (a_{i} \times b_{i}) - c_{i} = 0$

Theoretically, you can capture any kind of computation using this model, but still, using addition and multiplication alone is a bit constraining set of functionality. We would in particularly like to have specialized parts of a circuit that can be re-used, say for elliptic curve operations or hash functions; saving us a lot of time (and decreasing the number of rows in the table)!

Example: Using Hash Functions for Signature

Consider an example:

Key generation:
- For the secret key, just sample $\overset{s_{k}}{ˉ} \in B$ where $B = {0, 1}^{32}$ . The bar over $s_{k}$ denotes that it is a binary representation.
- $p_{k} = F (H (\overset{ˉ}{s k}))$ which is the hash of $\overset{s_{k}}{ˉ}$ mapped to some Field element.
Signing a message $\overset{m}{ˉ} \in B$ :
- First, pick a random $\overset{r}{ˉ} \in B$ .
- Your signature is $σ = F (H (\overset{r}{ˉ}, \overset{s_{k}}{ˉ}, \overset{m}{ˉ}))$

So let us construct our circuit $C (p k, \overset{m}{ˉ}, σ, \overset{s_{k}}{ˉ}, \overset{r}{ˉ})$ . This circuit must check the following:

Ensure that given variables are in the defined binary set: $\overset{r}{ˉ}, \overset{s_{k}}{ˉ}, \overset{m}{ˉ} \in B$
Recreate the signature: $\overset{σ}{ˉ} = H (\overset{r}{ˉ}, \overset{s_{k}}{ˉ}, \overset{m}{ˉ})$
Ensure that the given signature key matches the derived one: $σ = F (\overset{σ}{ˉ})$
Ensure that public key is derived from the secret key: $\overset{p_{k}}{ˉ} = H (\overset{ˉ}{s k})$
Ensure that the given public key matches the derived one: $p_{k} = F (\overset{p_{k}}{ˉ})$

Working with binary is expensive in circuits, because technically you are checking if a given linear combination of field elements times some multiple of two is equal to some other field element, e.g. $35 = 2^{5} \times 1 + 2^{4} \times 0 + 2^{3} \times 0 + 2^{2} \times 0 + 2^{1} \times 1 + 2^{0} \times 1$ .

Decomposing Circuits to Binary Sets

For the sake of this example, let us have 3-bit numbers only (rather than 256 as most people use). You might have a number $\overset{c}{ˉ} = [c_{3}, c_{2}, c_{1}] \in B$ where $B = {0, 1}^{3}$ . This gives us the constraint of $c = c_{3} + 2 c_{2} + 4 c_{1} = c_{3} + 2 \times (c_{2} + 2 \times (c_{1} + 2 \times 0))$ .

When we look at the second formulation, we realize that the same operation ( $\dots + 2 \times (\dots))$ is happening a lot of times repetitively! Now let us do some renaming:

$a_{0} = 0$
$a_{1} = c_{1} + 2 \times a_{0} = c_{1}$
$a_{2} = c_{2} + 2 \times a_{1} = c_{2} + 2 \times c_{1}$
$a_{3} = c_{3} + 2 \times a_{2} = c_{3} + 2 \times c_{2} + 4 \times c_{1} = c$

This way, notice that we are always doing the same operation on some other variable $a_{i}$ . Let us write a table for these.

$i$ (index)	$b$ (bits)	$a$ (accumulation)
0	$b_{0} = c$	$a_{0} = 0$
1	$b_{1} = c_{1}$	$a_{1} = c_{1}$
2	$b_{2} = c_{2}$	$a_{2} = c_{2} + 2 c_{1}$
3	$b_{3} = c_{3}$	$a_{3} = c$

There are some constraints to be written from this table:

We need to make sure $b_{i}$ is a bit for $i = 1, 2, 3$ . We can do that simply by $b_{i}^{2} - b_{i} = 0$ .
We also need to make sure $a_{i}$ is computed correctly. We can do that by: $a_{i} - b_{i} - 2 a_{i - 1} = 0$ for $i = 1, 2, 3$ .
For $a_{0}$ , we have the constraint $a_{0} = 0$ .
Finally, for $b_{0}$ we have the constraint $b_{0} - a_{3} = 0$ .

We will capture all these constraints with a polynomial. For that, we use something called Lagrange Interpolation. Denote $ω$ as a root of unity. Let $H := {1, ω, ω^{2}, ω^{3}}$ . We had 3 bits in our example, and our table has 3+1 rows, so that is why we use this degree. In this case, $ω^{4} = ω^{0} = 1$ which is as it is a root of unity.

Construct a set of Lagrange polynomials $L_{0} [X], L_{1} [X], \dots$ which gives us a really nice property:

$L_{j} (w^{i}) = {10 i = j i \neq = j$

Now consider an element $a \in F^{4}$ . We will also have a polynomial:

$\tilde{a} (x) = a_{0} L_{0} (x) + a_{1} L_{1} (x) + \dots$

This is a really neat polynomial where we can select values of $a$ such as: $\tilde{a} (ω^{i}) = a_{i}$ . You can create one for $b$ with the same procedure.

Let us revisit the constraints from before:

Applied Indices	Constraint	Selector	Polynomial Constraint
1, 2, 3	$b_{i}^{2} - b_{i} = 0$	$\tilde{S} (x)$	$b (x)^{2} - b (x) = 0$
1, 2, 3	$a_{i} - b_{i} - 2 a_{i - 1}$	$\tilde{S} (x)$	$a (x) - b (x) - 2 \tilde{a} (ω^{- 1} x) = 0$
0	$a_{i} = 0$	$1 - \tilde{S} (x)$	$\tilde{a} (x) = 0$
0	$b_{i} - a_{i - 3} = 0$	$1 - \tilde{S} (x)$	$b (x) - a (ω^{3} x) = 0$

Then you will make a single polynomial out of these, that the verifier can query at random points!

Lookup Tables

Consider a computation like $c_{i} = a_{i} \oplus b_{i}$ where $\oplus$ is the XOR operation. Calculating this every time in the proof and writing the constraints for it will be costly. Instead, you could have a large table that shows all possible operation for the XOR (for example if $a, b, c \in {0, 1}$ then the table has 4 rows, literally just the truth table for XOR operation) and simply make the argument "is there a row in this table with $a_{i}, b_{i}, c_{i}$ ? This is what is called a Lookup Argument. It is an optimization method to save from computation time, although generating the table could take time.

Think of the following problem, you have a $w$ and you would like to show that $w \in {0, 3, 4, 7}$ which is a set and is known publicly. You would like to prove $w$ is in this set without revealing what $w$ is! (Set Membership Proof)

Range Proof

For a private $w$ , we would like to prove $w \in [0, 2^{3})$ . Proofs like this appear a lot of times in SNARKs.

One of the ways this is done is via binary decompositions: $\exists b_{0}, b_{1}, b_{2} \in {0, 1}$ such that $w = b_{0} + 2 b_{1} + 4 b_{2}$ where all $b_{i}$ are secret too (otherwise you would reveal $w$ somewhat). We write this as:

$- w + b_{0} + 2 b_{1} + 4 b_{2} = 0$

There are additional constraints to ensure that $b_{i}$ is a bit, which is done by $b_{i}^{2} - b_{i} = 0$ (one for each $i$ ):

$b_{0} \times b_{0} - b_{0} = 0$
$b_{1} \times b_{1} - b_{1} = 0$
$b_{2} \times b_{2} - b_{2} = 0$

Rank-1 Constraint System

Let us construct the R1CS for this set of constraints.

$0000010000100001 w b_{0} b_{1} b_{2} \times 0000010000100001 w b_{0} b_{1} b_{2} - 1000 - 1 100 - 2 010 - 4 001 w b_{0} b_{1} b_{2} = 0000$

If you look at the calculations here for each element in the resulting vector, it captures all the four constraints! This literally captures the R1CS for the proof of $w \in [0, 2^{3})$ . Again, note that this set is public, but $w$ is not.

Using a Table

So again, let us consider the same example: for some private $w$ , prove that $w \in [0, 2^{3})$ . We will use a table this time:


0	5
1	6
2	7
3	-
4	-

We want to commit to a set of values in a table. We can do that by making a polynomial with the coefficients as these values. Suppose these values are $a_{0}, a_{1}, a_{2}$ . Then, construct a polynomial:

$f (x) = a_{0} L_{0} (x) + a_{1} L_{1} (x) + a_{2} L_{2} (x)$

where $L_{0}, L_{1}, L_{2}$ are the Lagrange polynomials over the public set $V = {ω_{0}, ω_{1}, ω_{2}}$ where $ω$ is a root of unity. What is a root of unity? It is a value such that $ω_{i}^{n} \equiv 1 (mod p)$ , also known as an $n$ -th root of unity.

The thing about Lagrange polynomial is that $L_{0} (x)$ is the unique polynomial such that:

$L_{0} (w_{0}) = 1$
$L_{0} (w_{1}) = 0$
$L_{0} (w_{2}) = 0$

When you use roots of unity, these polynomials turn out to be really quick to compute thanks to Fast Fourier Transform, but that is a bit too much detail to go into here.

Now, if you were to open $a_{1}$ on $f$ , all you do is show that $f (w_{1}) = a_{1}$ . For this, you do the following constraint:

$f (x) - a_{1} = (x - w_{1}) q (x)$

for some quotient polynomial $q (x)$ .

The rest of the video talks about HALO2, which I have not yet noted down.

zkEVM with Jordi Baylina

To start with zkEVM, first we must consider a really basic program:

flowchart LR
	Inputs --> Program --> Outputs

Inputs:

The state of Ethereum $s_{i}$
A set of transactions $t x_{1}, t x_{2}, \dots$

Output:

The new state of Ethereum $s_{i + 1}$

The proof using zkEVM is to show that given some state $s_{i}$ and transactions, you will obtain $s_{i + 1}$ . The Program is a deterministic program here though!

Note: Having lots of public inputs will make the verifier quite complex. So, instead of having all the inputs as a public input by themselves, a hash $h = H (s_{i}, t x_{1}, t x_{2}, \dots)$ is computed. This hash is then asked for a public input, and the circuit simply checks if the hashes match.

Example: Multiplication Circuit

flowchart LR
	A --> *
	B --> *
  * --> C

The relationship in this circuit is $C = A \times B$ . zkEVM will have huge amounts of these, it is very complex.

Example: State Transition Circuit

When we want to make use of a state transition circuit, the trick is to attach the output to an input.

flowchart LR
	P[Next]--> C
	subgraph Circuit
	A --> +
	B --> +
	+ --> *
	C --> *
	end
	* --> Next

This way, the output will be the input on the next "clock". We can represent this circuit as:

$C^{'} = (A + B) \times C$

We would like to work with Polynomials though. Let us represent this with polynomials then:

$C (w x) = (A (x) + B (x)) \times C (x)$

Here, $w$ will be a root of unity. This causes a "shifting" effect, the polynomial of say $F (x)$ will be the same polynomial $F (w x)$ but shifted horizontally.

Example: Fibonacci (Hello World of ZK)

flowchart LR
	NextA[A'] --> A
	NextB[B'] --> B
	subgraph Circuit
	A --> +
  B --> +
	end
	+ --> B'
	B --> A'

This circuit defines the following constraints:

$b^{'} = a + b$
$a^{'} = b$

We need to express these polynomial entities in some way. At Polygon zkEVM, they have developed PIL: Polynomial Identity Language. Let us write the PIL for this Fibonacci circuit:

// A Fibonacci circuit with 4 clocks
namespace Fibonacci(4);
	// Input Polynomials
	pol commit a;
	pol commit b;

	// Constraints
	b' = a + b
	a' = b

	// Public Input
	public r = b[3]; // value of b at clock 3 will be public input

	LL(b - r) = 0;
	pol constant LL;

This will give you the following computational trace:

Clock	$a$	$b$	$LL$
0	1	2	0
1	2	3	0
2	3	5	0
3	5	8 (result!)	1

Here, the end result 8 will be the public input.

Writing a Processor

If we can write a processor using constraints and circuits, we can do anything! That is the main idea. A processor typically has registers, a program counter, some ROM. These basically make up a state machine in the end. zkEVM has many state machines that handle various parts within (RAM, Storage, Arithmetic, etc.) but we will not cover it here.

Consider a simple machine, with the following registers:

A register that stores some value
B register that stores some value
A' register stores the new value of A in the next state
B' register stores the new value of B in the next state
PC is the program counter
ROM is a read-only memory, no details about this.
INST is a set of instructions:
- inA a Boolean
- inB a Boolean
- setA a Boolean
- setB a Boolean

Here is how this works:

namespace ExampleMain(2^28)
	pol constant inA, intB, setA, setB;
	pol commit A, B;

	pol op = inA * A + inB * B; // an intermediate value
	a' = op * setA; // if setA = 1, a will be op
	b' = op * setB; // if setB = 1, b will be op

zkID with Oleksandr Brezhniev

There are two big groups of identities:

Physical: Documents like Passport. Driver License. Diploma, and such. Mostly paper documents.
Digital: We can have three groups to consider in the digital identity world.
- Centralized Identities (Siloed): A separate email & password to login websites, one for each of them.
- Federated Identities: Uses don't want a separate identity for each website, so they do this only once (e.g. Google, Facebook) and they use this to access websites.
- Self-Sovereign Identities: Also known as Decentralized Identities, in this case the user has control over their own digital identity. They share this identity with others on request. This is more like the Web3.0 way of doing things.

Triangle of Trust + Blockchain

flowchart LR
	Issuer -- Claim --> User
Verifier -.- Issuer
	Issuer -- Store commitment --> Blockchain
	User -- ZKP --> Verifier
	Verifier -- Check commitment --> Blockchain

In this diagram, a User can be an actual User, or it could be group of entities, machines, whatever; it's digital identity in the end!

For the ZKP (zero-knowledge proof) to work, we need:

Private Data
Public Data

Actual claim data (e.g. I am above 18 years old) is stored on the user's device, but the commitment to these claims are stored on the blockchain. For this, PolygonID uses a Merkle Tree and stores only the root of the tree of claims in the blockchain. What if you lost your phone, that has the digital identity data (claims)? In this case, only the Issuer can revoke the claim.

Polygon ID

In the blockchain, PolygonID stores the identity state in a smart contract. All Issuer's use this same contract. It is the hash of three Merkle Tree roots:

Claims Tree: This tree is Private, only the Issuer has access to it.
- Claims Schema
- Identifier (to whom it was issued)
- Data
- Revocation Nonce
Revocation Tree: This tree is Public, anyone can see the nonce values.
- Revocation Nonce, if a nonce in the Claim tree is included here, the claim will be revoked.
Roots Tree
- Stores roots of Claim Tree and such, this is for optimization purposes they say.

An additional optimization by PolygonID is that Issuer can sign the claims with their private key, for others to verify. This is helpful in the case Blockchain update did not happen fast enough.

Circuit Example for Age

Here is an example circuit:

Private Inputs:
- Claim (age)
- Non Revocation
- Signature / Merkle Tree Proof
- …
Public Inputs:
- ID of the Issuer
- Query (e.g. eq, lt, in, nin)
Circuit: The circuit itself is a generic circuit designed by PolygonID. It takes the operation query as a public input so that you have less number of circuits.

Note: They are using Groth16. They have made a generic circuit that handles many many operations within. The operation itself is given as a public variable. Groth16 also helps generate these proofs on Mobile phones! They think it will take around 1 second to generate the proof.

Note: They aim to expand the queries (such as eq, lt) to provide more functionality. They are also developing a platform for Issuer's to onboard the PolygonID without worrying about key-management, and they can provide validator nodes for the blockchain that hosts the smart contract for verification. They also plan on providing KYC for this!

Introduction to Modern Cryptography

All of the content here are from my hand-written notes taken during the lectures of COMP443 - Modern Cryptography by Assoc. Prof. Alptekin Küpçü, the material from the book "Introduction to Modern Cryptography: Principles and Protocols, 2nd Edition" by Jonathan Katz & Yehuda Lindell, and several of Dan Boneh's cryptography open lectures on Coursera.

Preliminaries: We gathered all the important preliminaries at the start; there is some discrete probability, entropy and asymptotics background to cover.
Introduction: We meet symmetric ciphers, and get to know the plaintext, ciphertext and key spaces. We also provide a quick background of ciphers from history. Then, we look at the principles of modern cryptography.
Secrecy: We give a formal definition of perfect secrecy & indistinguishability, and show that One-time Pad is the only algorithm that can achieve it. Sadly, we also show that perfect is too good, and then give definitions for computational secrecy which is a bit more relaxed but still good enough for our lifetimes.
Reduction Proofs: In cryptography, to prove that something is secure we usually "reduce" that thing to some other thing that we know to be secure. This logic implies that if one were to break this new system, they should also be able to break the other system which is known to be secure. In this page, we describe how reduction proofs are made.
PRGs & PRFs: We talk about pseudo-random generators (PRG) and give an example of a PRG-based One-time Pad. Then, we look at pseudo-random functions (PRF). Both of these constructions play a huge role in cryptography!

Probability

Random Variable: A variable that takes on (discrete) values with certain probabilities
Probability Distribution for a Random Variable: The probabilities with which the variable takes on each possible value.
- Each probability must be $\in [0, 1]$
- Sum of probabilities should be equal to $1$
Event: A particular occurence in some experiment. $Pr [E] : probability of event E$
Conditional Probability: Probability that one event occurs, assuming some other event occured. For example, probability that A occurs assuming B occured: $Pr [A ∣ B] = \frac{Pr [ A \land B ]}{Pr [ B ]}$
Independent Random Variables: Two random variables $x, y$ are independent if $\forall x, y : Pr [X = x ∣ Y = y] = P r [X = x]$
Law of Total Probability: Say $E_{1}, E_{2}, \dots, E_{n}$ are a partition of all possibilities (no pair can occur at the same time). Then for any other event $A$ : $Pr [A] = i \sum Pr [A \land E_{i}] = i \sum (P r [A ∣ E_{i}] \times Pr [E_{i}])$
Bayes Theorem: A cool trick to change the orders in conditional probability. $Pr [A ∣ B] = \frac{Pr [ B ∣ A ] \times Pr [ A ]}{Pr [ B ]}$

Discrete Probability

Everything here is defined over a universe $U$ , such as ${0, 1}^{n}$ . The universe must be a finite set. As an example: ${0, 1}^{2} = {00, 01, 10, 11}$ . Also note that $Pr [U] = 1$ .
A probability distribution $P$ over $U$ is a function $P : U \to [0, 1]$ such that $\sum_{x \in U} P (x) = 1$ .
- A uniform distribution is when $\forall x \in U : P (x) = 1/∣ U ∣$ .
- A point distribution is when $P (x_{0}) = 1 \land \forall x \neq = x_{0} : P (x) = 0$ .
An event $A$ is a set $A \subseteq U$ and: $Pr [A] = x \in A \sum P (x) \in [0, 1]$
A union bound of $A_{1}$ and $A_{2}$ is: $Pr [A_{1} \cup A_{2}] \leq Pr [A_{1}] + Pr [A_{2}]$
If $A_{1}$ and $A_{2}$ are disjoint sets, then $Pr [A_{1} \cup A_{2}] = Pr [A_{1}] + Pr [A_{2}]$ .
A random variable $X$ is a function $X : U \to V$ where $U$ is the universe and $V$ is the value set.

EXAMPLE: Let $X : {0, 1}^{n} \to {0, 1}$ where $X (a) = lsb (a) \in {0, 1}$ and $lsb$ returns the least significant bit of its argument.

Let $U$ be some set such as ${0, 1}^{n}$ . We write $r R U$ to denote a uniform random variable over $U$ , in other words: $\forall a \in U : Pr [r = a] = 1/∣ U ∣$ .

A deterministic algorithm is one where $y \leftarrow F (x)$ , but a randomized algorithm is as $y \leftarrow F (x; r)$ where $r R {0, 1}^{n}$ . The output is a random variable $y R F (x)$ which defines a distribution over all possible outputs of $F$ given $x$ . For example, $Enc (k, m; r)$ is a randomized encryption!

Two events $A$ and $B$ are independent events if $Pr [A \land B] = Pr [A] \times Pr [B]$ .
Two random variables $X, Y$ taking values over $V$ are independent random variables if $\forall a, b \in V : Pr [X = a \land Y = b] = Pr [X = a] \times Pr [Y = b]$ .

XOR Operation

XOR is a very important function that will keep appearing throughout cryptography. It is a logical operation $\oplus : {0, 1} \times {0, 1} \to {0, 1}$ :

a	b	a $\oplus$ b
0	0	0
0	1	1
1	0	1
1	1	0

Theorem: For a r.v. $X$ over ${0, 1}^{n}$ and an independent uniform random r.v. $Y$ over ${0, 1}^{n}$ , $Z := X \oplus Y$ is a uniform random variable over ${0, 1}^{n}$ . This is a very important theorem!

Birthday Paradox

This is a quite surprising probabilistic result, and it will come handy in some proofs.

Theorem: Let $r_{1}, r_{2}, \dots, r_{n} \in U$ be independent identically distributed random variables. The theorem states that when $n = 1.2 \times ∣ U ∣^{1/2}$ , then:

$Pr [\exists i, j : i \neq = j \land r_{i} = r_{j}] \geq \frac{1}{2}$

Proof: We will first negate this probability:

$Pr [\exists i \neq = j : r_{i} = r_{j}] = 1 - Pr [\forall i \neq = j : r_{i} \neq = r_{j}]$

Let $B = ∣ U ∣$ , we see that:

$1 - Pr [\forall i \neq = j : r_{i} \neq = r_{j}] = 1 - (\frac{B - 1}{B}) (\frac{B - 2}{B}) \dots (\frac{B - ( n - 1 )}{B})$

$= 1 - i = 1 \prod n - 1 (1 - \frac{i}{B})$

We will use the inequality $1 - x \leq e^{- x}$ to obtain:

$= 1 - i = 1 \prod n - 1 (1 - \frac{i}{B}) \geq 1 - i = 1 \prod n - 1 e^{\frac{- i}{B}} = 1 - e^{\frac{- 1}{B} \sum_{i = 1}^{n - 1} i}$

We can further obtain another inequality as:

$1 - e^{\frac{- 1}{B} \sum_{i = 1}^{n - 1} i} \geq 1 - e^{\frac{- n ^{2}}{2 B}}$

When we plug in $n = 1.2 \times B^{1/2}$ here we get $1 - e^{- 0.72} = 0.53$ . Surely, $0.53 > 1/2$ , QED!

Entropy

In the context of discrete probability, entropy can be thought of as a measure of randomness. Suppose you have a r.v. $X$ with outcomes $x_{1}, x_{2}, ..., x_{n}$ with probabilities $p (x_{1}), p (x_{2}), ..., p (x_{n})$ .

Entropy is defined as:

$H (X) = - i = 1 \sum n p (x_{i}) lo g p (x_{i})$
Min-entropy is when $Pr [X = x] < 2^{- n}$ for any outcome $x$ .
Mutual Information measures how one random variable tells us about another. Suppose you have r.v. $X$ and $Y$ . Denote $P_{(X, Y)} (x, y) = Pr [X = x \land Y = y]$ as the probability mass function of $X$ and $Y$ , and $P_{X} (x) = Pr [X = x]$ and $P_{Y} (y) = Pr [Y = y]$ as the marginal probability mass functions of $X$ and $Y$ respectively. The mutual information of $X$ and $Y$ is then defined as:

$I (X, Y) = x, y \sum P_{(X, Y)} (x, y) lo g \frac{P _{(X, Y)} ( x , y )}{P _{X} ( x ) P _{Y} ( y )}$

Asymptotics

There are some special properties of functions that relate how much they grow with respect to one another.

Negligible functions are literally negligible; too small that you should not worry much about them (especially if the functions represents the probability of failure). Formally, a positive function $μ (n)$ is negligible if for any positive polynomial $p (n)$ there exists a constant $n_{0}$ such that $\forall n \geq n_{0}$ we have $μ (n) < 1/ p (n)$ .
Noticeable functions are kind of like the opposite of negligible ones. A positive function $f$ is noticeable if there exists a positive polynomial $p$ and a number $n_{0}$ such that $f (n) \geq 1/ p (n)$ for all $n \geq n_{0}$ .
Non-negligible functions are functions that are not negligible. This does not necessarily mean they are noticable!
Overwhelming functions are such that if $f$ is overwhelming, then $1 - f$ is negligible.

Note that space complexity should be upper bounded by Time complexity. You can't be using more complex space than time, because you should also spend time accessing/modifying whatever data you store.

An equivalent definition of negligible functions is given by the Big-O notation: a positive function $μ (n)$ is negligible if and only if $μ (n) \in O (1/ p (n))$ for all positive polynomials $p$ . The following functions are negligible:

$1/ 2^{n}$
$1/ n!$
$1/ n^{l o g n}$

A polynomial times polynomial functions is a polynomial; however, a polynomial times negligible is a negligible function.

Negligible vs Noticable

A positive function $f (n)$ is negligible if and only if for any positive polynomial $p (n)$ such that $p (n) f (n)$ converges to 0.
A positive function $f (n)$ is noticable if and only if there exists a positive polynomial $p (n)$ such that $p (n) f (n)$ goes to $\infty$ .

Note that for negligible we use "for any", but for noticable we use "there exists".

Introduction

sequenceDiagram
	actor Alice
	actor Bob
	Note over Alice,Bob: Both parties have k #8592; Gen(1^#lambda;)

	%% encryption and decryption
	Note over Alice: c #8592; Enc(k,m)
	Alice ->> Bob: c
	Note over Bob: m #8592; Dec(k,c)

An outline of how a symmetric cipher works is given above. There are 3 main components:

Key Generator (over a key space $K$ )

$Gen (1^{λ}) \to k, k \in K$
Encryption (over a message space $M$ )

$Enc (k, m) \to c, m \in M \land c \in C$
Decryption (over a ciphertext space $C$ )

$Dec (k, c) \to m^{'}, m^{'} \in M$

We should understand that $Gen$ has a $λ$ -bit input. $λ$ is known as the security parameter.

We also expect $m = Dec (k, Enc (k, m))$ , this is known as correctness property.

Note that if we know $Gen$ we learn the key space. With that, if we know message space and $Enc$ , then we will know the ciphertext space as well. Message is also known as plaintext; and, "Private key", "Secret key", "Symmetric key" all mean the same thing usually.

Old Ciphers

We will briefly describe some notable ciphers used back in the day.

Caesar Cipher

A very old example of cryptogrpahy is by Julius Caesar back then, we simply rotate the alphabet.

$Enc (k, m) = m + 3 mod 26$
$Dec (k, c) = c - 3 mod 26$
$Gen (1^{λ}) = 3$

Notice that key generation does not care about $λ$ , and both encryption and decryption does not use the key. It is very easy to break this cipher; you could simply look at the letter frequencies, or digram (pair of letters) frequencies.

Vigenere Cipher

The key is CRYPTO.
k = C R Y P T O C R Y P T O C R Y P T
m = W H A T A N I C E D A Y T O D A Y
------------------------------------- (+ mod 26)
c = Z Z Z J U C L U D T U N W G C Q S

Every $n^{'}$ th character has the same shift. This was pretty powerful back then, but it is breakable; especially if you know the key length beforehand. In fact, the key length can be found by looking at the uniformity of the characters.

For a Vigenere cipher with key length $1$ to $L$ :

determining the key length $\approx 256 L$
determining the bytes of key $\approx 256 L$
brute-force $\approx 25 6^{L}$

Rotor Machines

Then came the rotor machines, such as Enigma Machine. Details omitted. All of the examples so far has been "substituion ciphers" where characters are mapped to some other character.

Digital Ciphers

Not that long ago there was DES (1974), AES (aka Rijndael, 2001) and Salsa20 (2008).

Modern Cryptography Principles

Precise and format definition of security must be presented.
Assumptions should be clear, minimal and basic, complete.
Rigorous proof of security must be given.

Provably secure schemes can be broken if the definition does not correspond to reality, or if the assumptions are invalid. The best assumptions are ones that are old (thus still valid against test of time), simple (thus generic enough), and shared (thus general enough).

1: Formal Definition of Secure Encryption

Let us try to define the term "secure".

❌ - "No adversary can find the secret key, no matter what the ciphertext is.": Well, $Enc (k, x) = x$ provides this, but is definitely not secure ;)
❌ - "No adversary can find the plaintext from the ciphertext.": $Enc (k, x) = last half of x$ satisfies this, but is obviously not secure.
❌ - "No adversary can determine and character of the plaintext that correspond to the ciphertext.": This sounds good, but the adversary can still learn which characters of the alphabet is used, which may be bad. For example if the adversary learns the characters $e, h, y$ and the message is 3 letters, it is probably "hey".
✔️ - "No adversary can compute any function of the plaintext from the ciphertext": Now that sounds formal, but we need to be more formal!

Note that $F (m) = ∣ m ∣$ is a function of plaintext that gives its length. It is often very hard to hide this, so the last bullet often allows this function to be computable.

2: Assumptions

To make well assumptions, one usually considers threat models, i.e. how powerful is the adversary? There are 4 threat models, with increasing attack power:

Ciphertext-only attack: The adversary can attack just by looking at one (or many) ciphertexts.
Known-plaintext attack
Chosen-plaintext attack
Chosen-ciphertext attack

A good security definition against ciphertext-only attack is: "regardless of any prior information the attacker has about the plaintext, the ciphertext should leak no additional information about the plaintext."

3. Proofs

Typical proof of a scheme $X$ will show using a constructive argument, that if $X$ is broken, some assumption $Y$ will be violated.

If there exists an algorithm $A$ for breaking $X$ , then we can construct an algorithm $B$ to break the assumption $Y$ .
If $A$ is efficient (i.e. runs in probabilistic polynomial time) then so is $B$ .
The proof cannot present $A$ (in which case $X$ is already broken) but must present the "code" of $B$ . We always assume $A$ exists.

These all assume that if assumption $Y$ holds, $X$ is secure.

Perfect Secrecy

An encryption scheme $(Gen, Enc, Dec)$ with message space $M$ and ciphertext space $C$ is perfectly secret if $\forall m \in M$ and $\forall c \in C$ where $Pr [C = c] > 0$ , it holds that:

$Pr [M = m ∣ C = c] = Pr [M = m]$

I find this definition to be a thing of beauty. It is quite simple in logic: your idea about what a message may be at the start (a priori probability, right-hand side) should be equal to what you think that message is after seeing the ciphertext (a posteriori probability, left-hand side). If that is the case, seeing the cipher-text gave you no idea whatsoever.

EXAMPLE: Consider the same shift cipher example from before, and notice that $Pr [M = "ten" ∣ C = "rqh"] = 0$ . Since $Pr [M = "ten"] = 1/2$ , we can say this scheme is not perfectly secret.

Theorem: Suppose $(M, C, K, Enc, Dec)$ be a scheme where $∣ M ∣ = ∣ C ∣ = ∣ K ∣$ , i.e. spaces have equal cardinality. This scheme offers perfect secrecy if and only if every key is used with probability $1/∣ K ∣$ and for every $m \in M$ and $c \in C$ there is a unique key $k \in K$ such that $c = Enc_{k} (m)$ .

Further note that we can convert the equation:

$Pr [M = m ∣ C = c] = Pr [M = m]$

to be:

$Pr [C = c ∣ M = m] = Pr [C = c]$

Here, $Pr [C = c] > 0$ because if not, why would $c$ be in $C$ ? This implies that $Pr [C = c ∣ M = m] > 0$ .

Perfect Indistinguishability

We will now define a game between two players (an adversary $A$ and a challenger chal), and describe the security of an encryption scheme using it. Let $Π = (G, E, D)$ an encryption scheme. We define an experiment $PrivK_{A, Π}^{eav}$ as follows:

sequenceDiagram
	actor A
	actor chal

	A ->> chal: m_0, m_1

	Note over chal: generate key k #8592; G()
  Note over chal: choose uniformly random b #8712; {0, 1}
  Note over chal: c #8592; E(k, m_b)
  chal ->> A: c

  Note over A: compute b' #8712; {0, 1}

  Note over A,chal: "A" wins if b' = b

We say that the experiment results in 1 if $A$ wins. The encryption scheme $Π = (G, E, D)$ is perfectly indistinguishable if for every $A$ it holds that:

$Pr [PrivK_{A, Π}^{eav} = 1] = \frac{1}{2}$

In otherwords, $A$ is no more successfull than flipping a coin when it comes to guessing which message Bob has encrypted. Another definition for perfect indistiguishability is that:

$Pr [C = c ∣ M = m_{1}] = Pr [C = c ∣ M = m_{2}]$

which means that the probability $c$ is the ciphertext of $m_{1}$ is equally likely to be that of $m_{2}$ .

Lemma: An encryption scheme is perfectly secret if and only if it is perfectly indistinguishable.

One-time Pad

One-time Pad is a symmetric encryption scheme found by Gilbert Vernam in 1917. The scheme is quite simple: it is defined over the spaces $M = C = K = {0, 1}^{n}$ where the key is a random bit-string as long as the message.

To encrypt, you XOR (shown with $\oplus$ ) the message with the private key, and to decyrpt you XOR the ciphertext with the private key.

$E (k, m) = k \oplus m$

$D (k, m) = k \oplus c$

Theorem: One-time Pad has perfect secrecy.

Proof: Let us look at the probabilities of how a ciphertext can be obtained given a message: $\forall m \in M, c \in C :$

$k Pr [E (k, m) = c] = \frac{# keys k \in K s.t. E ( k , m ) = c}{∣ K ∣}$

We know that $E (k, m) = c = k \oplus m$ and thus $k \oplus m \oplus m = c \oplus m = k$ . This means that for any $m, c$ pair there is only one $k$ . As such, the numerator of the probability above, $# keys k \in K s.t. E (k, m) = c$ is equal to 1.

This is exactly the requirement of perfect secrecy, recalling the theorem given above. Q.E.D.

The Bad News

Having perfect secrecy is cool, but notice that there is something very dirty in the proof:

Just by XOR'ing the message and it's ciphertext, we were able to obtain the private key!
Furthermore, the key must be as long as the message itself, which is not really practical.
You should also use the key only for one encryption (hence the name), which is yet another practicality issue.
One-time Pad is perfectly secret only against cipher-text only attacks, it is very much vulnerable to other attacks. Especially if the attacker is active (i.e. can tamper with the message), you are going to have a bad day.

Well then, can we have some other encryption scheme that is perfectly secret? We will hear the bad news from Claude Shannon:

Theorem [Shannon]: For perfect secrecy, $∣ K ∣ \geq ∣ M ∣$ .

Proof: Assume that for perfect secrecy, $∣ K ∣ < ∣ M ∣$ . Define $M (c)$ to be all possible decryptions of some ciphertext $c$ :

$M (c) = {m ∣ m = D (k, c) for some k \in K}$

Clearly, $∣ M (c) ∣ \leq ∣ K ∣$ . If $∣ K ∣ < ∣ M ∣$ then there is some $m^{'} \in M$ such that $m^{'} \neq \in M (c)$ . But if that is the case,

$Pr [M = m^{'} ∣ C = c] = 0 \neq = Pr [M = m^{'}]$

and thus this contradicts perfect secrecy. Q.E.D.

Computational Secrecy

Having "perfect" secrecy is a bit too strict. We could perhaps allow the adversary to crack our system as long as it would take them a million years or something. We need to consider practical scenarios to be applied in real-life! A great way to think about security in this case is to have security such that it costs the attacker more than what they would gain by breaking the system.

To this extent, we could:

Allow security to fail with tiny probability (i.e. probability of failure is a negligible function)
Adversaries are efficient (i.e. run in probabilistic polynomial time)

Our parameters fit the real world, but may not be practical in terms of theory. This is why we require an "asymptotic" approach.

Asymptotic Indistinguishability

Fix $Π, A$ and define a randomized experiment $PrivK_{A, Π} (n)$ as follows:

sequenceDiagram
	actor A
	actor chal

	A ->> chal: m_0, m_1 where |m_0| = |m_1|

	Note over chal: generate key k #8592; G(1^n)
  Note over chal: choose uniformly random b #8712; {0, 1}
  Note over chal: c #8592; E(k, m_b)
  chal ->> A: c

  Note over A: compute b' #8712; {0, 1}

  Note over A,chal: "A" wins if b' = b

We assume that both parties know the security parameter $n$ , and the adversary is allowed to know the message length. $Π$ is indistinguishable if for all efficient $A$ there is a negligible function $ϵ$ such that

$Pr [PrivK_{A, Π} (n) = 1] \leq \frac{1}{2} + ϵ (n)$

EXAMPLE: Consider a scheme where the best attack is a brute-force search over the key space, and $G (1^{n})$ generates a uniform $n$ -bit key. The probability of guessing the correct key is $2^{- n}$ . If $A$ runs in time $t (n)$ , we have:

$Pr [PrivK_{A, Π} (n) = 1] \leq \frac{1}{2} + t (n) \times 2^{- n}$

Since $2^{- n}$ is negligible, $t (n) \times 2^{- n}$ is negligible and thus our scheme is asymptotically indistinguishable.

NOTE: A brute-force attack requires $∣ K ∣$ time to succeed with probability at most 1; a key-guessing attack requires $O (1)$ time to succeed with probability $1/∣ K ∣$ .

Semantic Security

There is yet another definition for asymptotic secrecy, which is cool but also slightly harder to work with.

sequenceDiagram
	actor A
	actor chal

  rect rgb(220, 220, 240)
  Note over chal: chal is given b #8712; {0, 1}
  end

	A ->> chal: m_0, m_1 where |m_0| = |m_1|

	Note over chal: generate key k #8592; G(1^n)
  Note over chal: c #8592; E(k, m_b)
  chal ->> A: c

  Note over A: compute b' #8712; {0, 1}

  Note over A,chal: "A" wins if b' = b

Challenger is given a bit $b \in {0, 1}$ , and we denote the respective experiment as $EXP (b)$ . Similarly, the probability of adversary winning the experiment is shown as $W_{b} = Pr [EXP (b) = 1]$ . Now define the difference in probability of winning these experiments as:

$Adv_{ss} [A, Π] := ∣ Pr [W_{0}] - Pr [W_{1}] ∣ \in [0, 1]$

If a scheme $Π$ is semantically secure for all efficient $A$ , then $Adv_{ss} [A, Π]$ is negligible. Note that this definition of looking at the difference between winning two experiments come in handy sometimes, such as when we are defining pseudo-random generators.

Reduction Proofs
- A more verbal explanation

Reduction Proofs

Reduction proofs are the main tool we will use to prove the security of some scheme. We have given a brief overview of them while describing the third principle of modern cryptography.

Suppose we have some assumption $Y$ and an encryption scheme $X$ such that:

$Y holds ⟹ X secure$

To prove this, we can look at it's contrapositive:

$X not secure ⟹ Y doesn’t hold$

If there exists an efficient $A$ that breaks $X$ , then we will construct an efficient $B$ that breaks $Y$ . Similar to how we had an adversary and a challenger in the previous interactions:

$B$ will be an adversary to an outside challenger. The interactions between $B$ and this challenger is defined by $Y$
$B$ will use $A$ within, and it will act as the challenger for $A$ . The interactions between $B$ and $A$ is defined by $X$

sequenceDiagram
	actor chal
  actor B
  actor A


  chal ->> B: #160;

  Note over B,A: B can interact with A
  B ->> A: #160;
  A ->> B: #160;

  Note over B,chal: can interact in between too
  B ->> chal: #160;

  chal ->> B: #160;

  B ->> A: #160;
  A ->> B: #160;

  B ->> chal: #160;

There are three points to consider in a reduction proof:

Efficiency: $A$ and $B$ must both be efficient, that is run in probabilistic polynomial time. As such, the interactions must be polynomially many times.
Simulation: $B$ must simulate a challenger to $A$ as in the definition of $X$ . That is the main requirement of interacting with $A$ within.
Probability: The statement that $A$ has non-negligible advantage should imply $B$ has non-negligible advantage.

Regarding the last point, notice that:

$\exists A non-negl. adv. ⟹ \exists B non-negl. adv.$

when taken the contrapositive is expressed as:

$\forall B negl. adv. ⟹ \forall A negl. adv.$

In other words, if we can't construct any $B$ to have a non-negligible advantage then we can say there can't be any $A$ with non-negligible advantage.

A more verbal explanation

We can state reduction proofs as follows:

Goal: Prove that a scheme $X$ is secure as long as $Y$ holds.
Method: If there exists an efficient adversary $A$ that breaks scheme $X$ with non-negligible probability, then we construct an efficient adversary $B$ that breaks assumption $Y$ with non-negligible probability.
Result: Since there is no known break on the assumption $Y$ (hopefully your assumptions are good enough for that), this means no such adversary $A$ exists; otherwise, assumption $Y$ would already be broken. Alternatively, if we could break $X$ we should be able to break $Y$ , therefore if we cannot break $Y$ then we cannot break $X$ .

So given an efficient adversary $A$ , we try to construct adversary $B$ such that:

$B$ uses $A$ as a subroutine,
$B$ is efficient, that is it performs at most polynomial amount of work on top of $A$ ,
The success probability of $B$ breaking $Y$ is at most negligibly worse than the success probability of $A$ breaking $X$ ,
$B$ simulates the challenger in scheme $X$ for $A$ .

Note that a proof can be black-box: the constructed $B$ need not know how $A$ works, but know that it does break $X$ with some non-negligible probability.

Pseudo-random Generators

Let us think of an experiment with an efficient distinguisher $D$ that plays the following two games:

A game where challenger gives the distinguisher a uniformly random chosen value; the distinguisher wins if it can find that this value was actually uniformly randomly generated:

sequenceDiagram
	actor D
  actor chal

  Note over chal: pick uniformly random y_unif

  chal ->> D: y_unif

  Note over D: compute b' #8712; {0, 1}

  Note over D,chal: "D" wins if b' = 1

A game where challenger gives the distinguisher a pseudo-randomly generated value, with a uniformly random chosen seed; the distinguisher wins if it can find that this value was actually pseudo-randomly generated:

sequenceDiagram
	actor D
  actor chal

  Note over chal: pick uniformly random s
  Note over chal: y_psd = G(s)

  chal ->> D: y_psd

  Note over D: compute b' #8712; {0, 1}

  Note over D,chal: "D" wins if b' = 0

With these two games, we would like to see:

$∣ Pr [D (y_{u ni f}) = 1] - Pr [D (y_{p s d}) = 1] ∣ \leq negl$

Notice that this is the statement we made in semantic security. Also notice that the seed of pseudo-random generator ( $s$ in $G (s)$ ) is assumed to be uniformly random. If the seed is known, you can easily guess which number will be generated.

A more formal definition

Let $l$ be a polynomial and let $G$ be a deterministic polynomial-time algorithm such that for any $n$ and any input $s \in {0, 1}^{n}$ , the result of $G (s)$ is a string of length $l (n)$ . We say that $G$ is a pseudo-random generator (PRG) if the followings hold:

Expansion: For every $n$ it holds that $l (n) > n$ . This $l$ is called the expansion factor of $G$ .
Pseudorandomness: For every efficient (i.e. probabilistic polynomial time) distinguisher $D$ there is a negligible function $negl$ such that:

$∣ Pr [D (r) = 1] - Pr [D (G (s)) = 1] ∣ \leq negl$

where $s$ is chosen uniformly random from ${0, 1}^{n}$ and $r$ is chosen uniformly random from ${0, 1}^{l (n)}$ .

PRG-Based Encryption: One-time Pad Example

Consider a scheme $X$ and a PRG $G : {0, 1}^{n} \to {0, 1}^{n^{'}}$ , $M = {0, 1}^{n^{'}}$ .

$Gen (1^{n}) \to k : k \leftarrow {0, 1}^{n}$

$Enc (k, m) \to c : c := G (k) \oplus m$

$Dec (k, c) \to m^{'} : m^{'} := G (k) \oplus c$

Theorem: $G$ is a secure PRG $⟹$ $X$ is a secure encryption scheme against a 1-message eavesdropper.

Proof: If $\exists$ efficient $A$ who breaks $X$ , then we construct efficient $B$ who breaks $G$ ; as in, $B$ would be able to distinguish wheter a given value is pseudo-random or uniformly random with non-negligible advantage.

sequenceDiagram
	actor chal
  actor B
  actor A

  alt uniform ("R")
  rect rgb(220, 220, 240)
  Note over chal: r #8712; {0, 1}^n'
  end
  else pseudo-random ("PR")
  rect rgb(240, 220, 220)
  Note over chal: s #8712; {0, 1}^n
  Note over chal: r = G(s)
  end
  end

  chal ->> B: 1^n, r

  B ->> A: 1^n

  Note over B: b #8712; {0, 1}

  A ->> B: m_0, m_1

  Note over B: c = r #8853; m_b

  B ->> A: c

  A ->> B: b'

  alt b = b'
  rect rgb(220, 220, 240)
  Note over B: output "R"
  end
  else b #8800; b'
  rect rgb(240, 220, 220)
  Note over B: output "PR"
  end
  end

Looking at $A$ 's game:

$Pr [b^{'} = b] = \frac{1}{2} + ϵ (n)$

where $ϵ (n)$ is either negligible or non-negligible, we do not know yet :)

Looking at $B$ 's game:

$∣ Pr [B outputs R ∣ r is R] - Pr [B outputs R ∣ r is PR] ∣ = ϵ (n)$

Since $B outputs R$ if and only if $b^{'} \neq = b$ :

$Pr [B outputs R ∣ r is R]$ is equal to $1/2$ . We know this from One-time Pad.
$Pr [B outputs R ∣ r is PR]$ is equal to $1/2 + ϵ (n)$ probability, and for $b^{'} \neq = b$ player $A$ would need to lose. So what is the probability $Pr [b^{'} \neq = b] = 1 - Pr [b^{'} = b] = 1 - (1/2 + ϵ (n))$ .

Our expression for $B$ 's game is then:

$\frac{1}{2} - (\frac{1}{2} - ϵ (n)) = ϵ (n)$

At this point, we were able to connect formally that the probability $B$ wins it's game is equal to the probability $A$ wins it's game. If $A$ were to have a non-negligible advantage in guessing $b^{'}$ , then $B$ would have a non-negligible difference between the result of its games. However, we had assumed that $G$ is a pseudo-random generator so $B$ should not have a non-negligible difference!

Therefore, $B$ must have a negligible difference ( $ϵ (n)$ is negligible) and thus $A$ has negligible advantage. $X$ is indeed a secure scheme (against a 1-message eavesdropper, which is a very puny weak defense but hey its something). Q.E.D.

Pseudo-random Functions

Denote the set of all function that map ${0, 1}^{n} \to {0, 1}^{n}$ as $Func_{n}$ . Note that this a HUGE set, with the size $2^{n 2^{n}}$ .

Now, let $F : {0, 1}^{*} \times {0, 1}^{*} \to {0, 1}^{*}$ be an efficiently computable function. Define $F_{k} (x) := F (k, x)$ . Here, we refer to $k$ as key. Assume that $F$ is length preserving : $F (k, x)$ is only defined if $∣ k ∣ = ∣ x ∣$ , in which case $∣ F (k, x) ∣ = ∣ k ∣ = ∣ x ∣$ . Notice that choosing $k \leftarrow {0, 1}^{n}$ is then equivalent to choosing the function $F_{k} : {0, 1}^{n} \to {0, 1}^{n}$ . In other words, $F$ defines a distribution over functions in $Func_{n}$ .

We define that $F$ is a pseudo-random function if $F_{k}$ for a uniform key $k$ is indistinguishable from a uniform function $f \in Func_{n}$ . More formally, for all distinguishers $D$ :

$∣ k \leftarrow {0, 1}^{n} Pr [D^{F_{k} (.)} = 1] - f \leftarrow Func_{n} Pr [D^{f (.)} = 1] ∣ \leq negl (n)$

where $F_{k} : {0, 1}^{n} \to {0, 1}^{n}$ for some $n^{'} = poly (n)$ .

It is easy to think of an interactive proof where the distinguisher keeps querying $x$ to get $f (x)$ in one scenario, and $F_{k} (x)$ in the other; if it can't distinguish the difference after polynomially many such queries, our dear $F_{k}$ is a pseudo-random function! Also note that given a PRF $F$ , we can immediately obtain a PRG $G$ . For example:

$G (k) = F_{k} (00 \dots 0) ∣∣ F_{k} (00 \dots 01)$

where $∣∣$ is concatenation.

Pseudo-random Permutations

Let $F$ be a length-preserving keyed function. Then, $F$ is a keyed-permutation if:

$F_{k}$ is a bijection for every $k$ , meaning that $F_{k}$ is invertible.
$F_{k}^{- 1}$ is efficiently computable, where $F_{k} (F_{k}^{- 1} (x)) = x$ .

Essentially, a PRF with $n^{'} = n$ and bijection is a PRP.

Cryptonotes