Investment Compliance in Hedge Funds using Zero Knowledge Proofs

Financial Regulation is a form of compliance system that subjects financial institutions to certain requirements and restrictions. Investment Compliance is an example that involves investment restrictions and monitoring on behalf of investors. Hedge Funds differ from other traditional funds such as mutual funds because of their ability to employ complex investment and hedging techniques. These are private entities with few public disclosure requirements. This is useful in a way as the strategies used are confidential which allows financial agents to participate in the financial markets without any fear of information leakage, thereby promoting liquidity. However, this is often implied as the lack of transparency. Hedge Funds are expected to produce higher returns, but sometimes investors seek a risk guarantee in addition to higher returns. However, too much transparency rules out the incentives financial entities have by participating in the first place. On the other hand, too much secrecy may give rise to malicious entities that can break the rules due to a lack of compliance. We aim to solve this problem of protecting investors while ensuring the privacy of financial bodies using zero knowledge proofs. Proofs can be visualised as a way of providing enough information to investors while the zero-knowledge property of proofs maintains the privacy of the fund manager’s strategies. We propose a protocol to address this scenario using Zokrates, a framework for verifiable computation using Zk-SNARKs on Ethereum, to encode the constraints and export the verifier. Based on our implementation and analysis, it can be concluded that zero knowledge proofs provide us with a variety of ways to develop compliance


Introduction to Investment Compliance
In the financial context, the term hedge refers to placing limits on risk. The ability to employ complex trading strategies distinguishes hedge funds from other funds. Generally, these are considered risky investments, which is why only accredited investors, investors with high financial sophistication, can make investments in them. Although hedge funds are not subject to many restrictions that apply to regulated funds, guidelines were passed in some countries following the financial crisis of 2008 to increase government regulation of hedge funds. In addition, SEC and other regulatory bodies have requested more transparent hedge fund practices over the years [34,38].
Hedge Funds are privately owned funds that face relatively fewer regulations and conditions than other funds (e.g. mutual funds and equity funds). To protect investors, there are strict guidelines from regulatory bodies, such as SEC. Few examples would be that only investors with income more than a particular value are allowed, only investors with a net worth exceeding a particular value are allowed, etc. However, investors would also like to ensure that fund managers are behaving properly and that their investments do not exceed the level of risk. On the other end, the fund manager might not want to disclose all their portfolio characteristics as this may lead to leakage of the strategies used by them. Portfolio characteristics for a particular fund describe the allocation of investments in different assets.
We begin by defining zero knowledge proof systems [36], a scheme in which the prover convinces the verifier about the fact that they have knowledge about a particular statement without revealing anything about the statement. Section 2 describes the zero knowledge proofs in detail. Due to the confidential nature of the portfolio and the need to regulate the investment process to protect interest of investors, this problem can be reduced to zero-knowledge proofs. Proofs can be visualised as a way of providing enough information to investors while the zero-knowledge property of proofs helps to maintain the privacy of the fund manager's strategies.

Related Works
To solve the problem of conflict of interest between investors and fund managers, Szydlo [31], in 2005, described a protocol between investors and fund managers. Precisely, he described the portfolio characteristics and risk factors for each asset and defined a linear condition that is to be proven by the fund manager to convince investors that their risk measure does not exceed any predefined risk threshold. For this, he used Pederson Commitments [36] and Interval Proofs using Shoup's NTL package [37]. Another related work is given by Gowravaram [18] which uses the same method of commitments and Interval Proofs.

Our Contribution
As there is a lack of trust between the fund manager and investors, there needs to be a way to solve this problem of conflict of interest between parties. Here comes the role of blockchain smart contracts to verify that the fund manager follows the rules specified by the investor (or predefined by the fund manager) without depending upon any central authority. We use Ethereum smart contracts as a form of agreement between two parties such that investors can verify that funds follow the specified guidelines and are behaving properly. For this, we use a zero-knowledge proof systems framework Zokrates (SNARKS for Ethereum), which uses libsnark by Pinocchio protocol (or bellman for Groth16). Libsnark is a C++ library for SNARK systems and provides mechanisms to encode most of the problems in the form of Rank-1 Constraint Systems(R1CS) and then into Quadratic Arithmetic Programs (QAP), from which proofs are generated such that bilinear maps can be used for verification which makes it efficient to verify. To summarise, • Zokrates framework provides us with the ability to generate the Solidity Contract which can be deployed directly on Ethereum and verification can be performed by calling a method on the contract. • One can specify any condition (that can be encoded in libsnark) and encode it into constraints so that verification can be performed in constant time and with constant proof size. • Using this method to encode the constraints also gives us an added advantage to encode quadratic (and higherdegree) constraints that might be required from the financial point of view.
We begin with the definition of zero knowledge proofs and cryptographic preliminaries required for the protocol in Section 2. Section 3 describes Pinocchio Protocol and Zokrates architecture. In Section 4, the problem statement is explained in detail. Section 5 describes the protocol workflow and implementation details using Zokrates. Section 6 presents the evaluation results of the proposed protocol. Finally, in Section 7, we conclude this article and suggest some scope of future work for this application.

Zero-Knowledge Proof Systems
The concept of zero-knowledge was first introduced by three MIT researchers, Shafi Goldwasser, Silvio Micali and Charles Rackoff [35], where they were working on interactive proof systems in which the prover convinces the verifier that some statement holds by sending interactive messages. Previously, the research work in this context was assumed to have an honest verifier where a malicious prover tries to convince the verifier about the correctness of some statement. These researchers turned the problem and gave a new aspect in which a verifier can also be malicious. Precisely, they emphasised; how much extra information the verifier can derive from the proof transcripts other than the fact that the statement holds. Any ZKP proof system must have the following three properties: • Correctness: If the statement is true, the prover should be able to convince the verifier with overwhelming probability. • Soundness: If the statement is false, the prover should not be able to convince the verifier at any cost. • Zero-Knowledgeness: The verifier must not be able to learn anything except that the statement holds.
Proving correctness can be done easily by playing multiple rounds of the protocol interactively giving a probabilistic guarantee to the proof system. To prove soundness, we make use of the existence of a knowledge extractor that interacts with the prover and can extract the witness from the transcripts if the protocol is completed successfully. The fact that the extractor can retrieve the witness from transcripts implies that the witness was injected into the transcripts by the prover.
The challenging part comes in proving the last property.
Researchers have argued that zero-knowledgeness can be proven by using the concept of Simulation. If it can be proven that there exists a simulator that has no information and whose transcript is identically distributed to the real prover, then the verifier can extract the same amount of knowledge from the real transcripts as can be extracted by simulated transcripts; however, as the simulated transcripts have no information in the first place, the verifier cannot extract any information from the real transcript as well.
In Zokrates, all arithmetic operations are defined on a finite field [30], specifically, a Galois Field, ( ) with = . This means all operations are modulo where is the order of a group of elliptic curves [7]. In Zokrates, this is defined as This value is taken so that, it is equal to the group order of the BN128 curve used in Ethereum. This makes verification on the blockchain much cheaper as Ethereum provides precompiled contracts for the BN128 curve. As elliptic curve operations such as addition and multiplication involve modular arithmetic and modulo operations are inefficient in SNARKs, incorporating elliptic curve cryptography becomes very expensive in the Zokrates system. This is solved using an embedded curve in Zokrates, BabyJubJub, which has parameters such that the order of the field over which it is defined becomes equal to the group order of the system curve. This way elliptic curve operations get reduced to the simple field arithmetic in Zokrates and make elliptic curve operations nearly free.

Understanding Zokrates
Zokrates is a toolbox that uses SNARKs for verifiable computations. It provides us with all the tools from specifying the constraints in DSL to export the verification code to Solidity smart contract. In this section, we discuss the details of the Pinocchio Protocol by PGHR13 [26] and, finally, we discuss Zokrates.

Pinocchio Protocol
A verifiable computation contains three algorithms ( , , and ). takes the computation function, a security parameter, and converts it to Common Reference String (CRS). This will output a proving and verification key.
will take the computation function, inputs and proving key and gives the output to computation and proof.
will verify the proof using the verification key. Proof needs to be zero knowledge for our case.
We consider four important aspects of this protocol. • Zero-Knowledgeness: If ( , ) is a function with as the public input and as the private input, then given a proof and output for the given function , there must not be any way of extracting from the given information.
• Efficiency: must be cheaper as compared to .
is also important but this depends on the underlying constraints, so the amortised cost is reasonable.

KEA Assumption (Knowledge of Exponent Assumption):
For any adversary , taking input , , and returns ( ; ) with = , there always exists a knowledge extractor which given the same inputs as , returns such = . Additionally, if given two points where = Y and a point , then the only way to calculate is when is derived from ; that is, there exists some that is . = .
Quadratic Programs: Now, we assume an arithmetic circuit and define a Quadratic Arithmetic Program (QAP). For simplicity, we assume a simple circuit as shown in Figure 2 with four inputs and two outputs from multiplication gates. and are the inputs to gate . , and are the inputs to gate (addition gates are not considered). and are the outputs of gates and , respectively.

Figure 2: Circuit for QAP
QAP is defined as: . } be three sets of m+1 polynomials and ( ), a target polynomial. Let be a function taking elements of , giving ' outputs and let = + '. Then, The size of is and degree is ( ( )).
Now we select a root ∈ for each multiplication gate and express the target polynomial ( ) as ∏ ( − ). , are defined such that encodes the left input for each multiplication gate, encodes the right input and encodes the outputs. Also, we define is known, this becomes very trivial; therefore, k needs to be hidden or thrown out after using so that it cannot be used again. This dumping of toxic waste is important and the whole task of generating these points is known as a trusted setup and must only be performed by someone trustworthy. Considering this, the only way to come up with a point ( , ) such that . = is when is a linear combination of ( , . . . ) and is a linear combination of ( , . . . ) which implies that the coefficients are known by the prover.
Verifiable Computation: In a real-world scenario, most of the time the polynomials , and are very large; therefore, we cannot use them directly. To solve this problem, polynomials are converted into elliptic curve points. Using elliptic curve points also helps in verifying the correctness. Formally, instead of sending polynomials , and , we send elliptic curve points in the form: To make sure all these linear equations are using the same coefficients, this value is also added to the setup: = * ( ( ) + ( ) + ( )) * . is again the toxic waste. Then, we use elliptic curve pairings to verify that * − = − . We check that To check that all combinations are using the same coefficients, we again use the pairings and verify that matches with the provided + + .

Zokrates
Zokrates uses the idea of the delegation of computation.
Computation is delegated to a single node rather than all nodes traditionally and that node executes the logic and publishes the result on-chain ( Figure 3). This method gives two advantages.

Figure 3: Delegated Computation in Zokrates
• The delegate node can use private information to execute the computation and publishes only the result. This is not possible in the traditional blockchain setting. • Delegate Node only writes the result to the blockchain which increases efficiency in a way that all the nodes only store the result.
However, the problem here is any delegated node needs to be trusted. Therefore, the idea of verifiable computation is employed using Pinocchio Protocol. Delegated Node becomes the prover and computes the proof for computation, which is then verified by nodes on the blockchain. Privacy can be maintained by using zero-knowledge proofs.

Architecture
Zokrates supports writing the code in high-level language and converting it to a verification smart contract so that it can be deployed and the proofs verified on-chain. It has some inbuilt components for its processes. Below is the summary of each component in Zokrates.
• Compiler: Parsing and Flattening of Code is done by the Compiler inside Zokrates. After flattening, the constraints are transformed into a format that can be easily converted into R1CS constraints. • Witness Generator: Before executing the program and generating the proof, the code must be given a valid assignment of input variables. The witness generator takes the valid inputs, interprets the flattened code and generates the witness. • Circuit Importer: Sometimes, flattened code is hand optimised by developers. The circuit importer supports the functionality of importing the constraints directly into the Zokrates toolbox. • Setup and Proof Generator: Setup takes the code and witnesses generating an evaluation and verification key. These keys are used in proof generation and verification. • Contract Generator: According to the verification key, a solidity contract is generated which has all support for ECC operations using bn256g2 library and for providing elliptic curve pairing operations in verifyTx method which is called to verify the transaction.

Figure 4: Zokrates Components
Zokrates internal processes are summarised in Figure 4.
Zokrates can be used with three proving schemes currently, namely, PGHR13, Groth16 and GM17. In our application, we have mainly used PGHR13 and Groth16. Groth16 has some variations like shorter proof size (only 3 curve points are given as proof as compared to 8 in PGHR13) which makes it more efficient.

Problem Statement
Hedge Funds are more private investment firms. The fund manager after collecting the investment from all investors starts investing it. They use different strategies and statistical techniques to allocate the amount in different assets. This allocation is private to a firm and not disclosed by the fund managers as this might leak the strategies used by them. We define a set containing all the assets in which a fund manager makes any investment. = { , . . . . . . }

Such that | | = ∈ .
For any investor, his/her investment is allocated in different assets in . We define these allocations by weight (fraction of total investment assigned in a particular asset). These are also called portfolio weights. An allocation for an investor in different assets defines their portfolio. Portfolio weights are kept private by fund manager. Here is the portfolio, ~ is the fraction of total investment invested in asset .  Investors in these funds expect the higher returns but they also expect that amount of risk should not be too high. For example, investing too much of an investment amount in an asset that has a higher risk degree might introduce a conflict of interest with the investors. An investor might not be comfortable with too much amount assigned to a single asset.
To estimate the risk for each asset in the market, fund manager calculates the risk factor . These quantities are public.
The fund managers need to convince the investor that they are following the guidelines and not investing too much of their money into a risky investment. So, the condition defined is where and are the limits specified by investor. Sometimes risk factors are specified as the correlation between any two assets such that , specifying the risk factor if both and are used in high or low proportion. Correspondingly, non-linear conditions can be defined as Sometimes, the investor also wants portfolio weights not to exceed a certain quantity for a single asset. This gives us the following (individual condition): where is the individual risk threshold for each asset.

Protocol Workflow
In this section, we present a protocol to be used by the fund manager and investors that allows investors to be convinced that fund managers are behaving properly. After that, we discuss some implementation details.

Participants
• Fund Manager/Prover: Fund Manager needs to follow the protocol to convince the investor of specified conditions. (Or the Financial body may employ an auditor to accomplish this task of proving.) • Investor/Verifier: Investors will give the conditions or agree upon predefined conditions, participate in the protocol and wait for the prover to convince him/her. • Government Regulatory Body: Regulatory Body provides all the necessary guidelines that need to be followed by the fund manager/prover to avoid any conflict of interest with the investors and ensure transparency in some way.

Protocol
There are two phases in this protocol. Initial Phase and Use Phase.

Initial Phase
i. The fund manager will publish the details of portfolio characteristics including the universe of assets(A), risk factors(F) and the public key to be used for convincing the verifier. Investors will only invest if they agree upon these points. ii. Fund Manager deploys Record contract and publishes the contract address and ABI. iii. Investors register themselves on Record smart contract and send the obtained ID to the Fund Manager on a secured private channel confirming their participation.

Use Phase
i. Investors will compile the DSL specifying all conditions, the public key of the prover and export the verification smart contract by specifying their constraints. ii. Investors deploy the contract on the blockchain, set the contract address and proving key hash on the Record smart contract. iii. Investors can also provide their custom conditions and limits if agreed by the fund manager initially(optional).
iv. The prover/fund manager will compile the imposed DSL and make sure that the bytecode matches with the smart contract deployed. Prover, then computes the witnesses generating the proof in JSON format using their private key and proving key shared by the investor. v. The prover will upload the proof as JSON and call the verifyTx method on the smart contract. vi. Verifier will watch for Success Event on the smart contract deployed to get convinced that all the conditions are satisfied, and that proof was generated by the fund manager only. If the event is not triggered, investor can report to the regulatory body. Figure 5 gives a basic illustration of the protocol.

Implementation
The record contract deployed by the Fund Manager is written in pure Solidity. Full code can be found in Appendix A. The contract has three methods.
• Register (): This method is called by investors in Initial Phase. It generates a unique ID for each investor incrementally, stores the id in the mapping with the investor address and returns the ID.

• Set (): This method is also called by an investor in Use
Phase to set the Verifier address and proving key for them. It also verifies that only investors should be able to call this method for themselves.

• Get (): This method is called by the Fund Manager in
Use Phase. It returns the Verifier address and proving key for a given Investor ID. It also verifies that only the fund manager(owner) should be able to call this method.
The DSL for Zokrates is prewritten and contains values like the public key of the fund manager, risk factors and so on. Values like X, Y and h are injected by the investor before compiling. As proving key is very large, storing it on the smart contract is not viable, so investor first uploads the key file on IPFS and stores the obtained IPFS hash on Record Contract. The prover then retrieves it by the given hash. There are n+1 private arguments for n portfolio weights. One input is the private key generated from BabyJubJub Curve. ECC library provides us with cryptographic support with Edwards Curve (embedded curve in Zokrates) which fits well within the context of Zokrates.
After compiling, constraints are converted to QAP and finally exported to Solidity smart contract. This contract is deployed on Ropsten Testnet by the investor sharing contract address and key hash on Record Smart Contract. The Fund Manager gets the contract address from the Record Contract. The Record Contract is compiled such that only the fund manager (owner) can get this data of investor and nobody else other than the investor can set their details like contract address etc. After getting the address, the fund manager computes the witnesses and generates the proof in the form of JSON which is used directly to call verifyTx function.

Evaluation and Results
In this section, we analyse and evaluate the processes involved in our protocol. We divide our evaluation into two parts: (1) On-chain verification and (2) off-chain processes like generating keys, generating proof, etc.

Verification on-chain
The most significant part of the protocol is on-chain verification. We performed our testing on Ropsten Testnet. As verifyTx method is dependent on proof and the number of public inputs, verification will take constant time in our application irrespective of the size of the asset list. Therefore, even with many constraints in our application, verification will always be efficient. We compared the verification for two protocols, PGHR13 and Groth16. As in Groth16, proof size is smaller as there are only three elliptic curve points, we found that Groth16 performance is better than PGHR13 with ≈ 0.2 million gas used in Groth16 as compared to ≈ 0.5 million in PGHR13. Also, in deployment, gas used by Groth16 is ≈ 0.9 million whereas, in PGHR13, it is ≈ 1.4 million. These values are the average of 20 transactions on Ropsten Network.

Off-chain Processes
Off-chain processes include compilation, key generation, exporting the verifier, computing witnesses and proof generation. PGHR13 scheme in Zokrates uses libsnark as its backend. Compilation and exporting the verifier are the core Zokrates processes while generating keys and proofs are done by libsnark in its components. First, we tested these steps using PGHR13 proving scheme on Zokrates and obtained the constraint system data for each number of assets.  10  17892  11  16005  20  29602  21  26144  50  64732  51  56565  100  123282  101  107265  200  240382  201  208665 Then to measure performance, we run a profiling routine for key-generation and proof generation on PGHR13 proving scheme using libsnark as given in [29] with the data obtained. This layout uses a dense synthetic R1CS structure, so all these results are the upper bound. For other processes like compilation, exporting the smart contract, and computing witnesses, we used time command on Linux Machine. Below is the data we obtained.
These results are calculated by taking the average runtime of 3 execution rounds for each step. From these results, we found that setup is the bottleneck for the verifier and takes most of the time.  From the graph in Figure 6, we conclude that for a few hundred assets, the verifier can complete execution in approximately 8-9 minutes and the whole process can be completed in about a minute for a single investor.

Conclusion
The protocol presented provides us with the ability to use zero-knowledge proofs in the financial regulatory system. Based on our implementation and analysis, we conclude that using Zokrates (or SNARKS) offers us a variety of ways to come up with the compliance system. Using this, a lot of realworld bottlenecks like paper trails and account-keeping can be avoided. Also, as every financial organisation must be compliant with a regulatory body, such as SEC, this use-case serves as an introductory solution to many regulation environments.

Scope for Future Work
In our implementation, we have made some assumptions that can be handled to improve the application and explore some other opportunities. For example, we assumed the precision of up to 10 bits for weight quantities. This can be further extended if the number of assets is lower in number such that the resulting risk measure can fit well in Zokrates field type. Also, we can try other types of conditions which might be important from the financial point of view. In addition to this, we can also come up with a different protocol that uses other proving schemes like Bulletproofs integrated with some refereed delegation approach to make the verification cheaper.