When buzzwords collide — zkML

Extropy.IO
4 min readFeb 14, 2024

Zero Knowledge Proofs and Machine Learning is a marriage made in buzzword heaven, but what is the reality beyond the hype ?

zkML is a rapidly growing field with immense potential to deliver privacy preserving, verifiable complex calculation.

We have seen the demand for complex machine learning applications, but alongside that there are concerns about the privacy and integrity of the calculations and data delivering the applications.

Large language models, such as GPT and Bard, run algorithms on vast amounts of training data to generate human-like responses to inputs.

Different approaches can be taken to provide privacy to machine learning, fields such as

  • Federated learning,
  • Differential privacy,
  • Homomorphic encryption
  • Secure multiparty computation

Federated learning allows models to be trained on local devices and learn from data without the data leaving the device.

Differential privacy adds a form of noise to the data to prevent the identification of sensitive data in a dataset.

Homomorphic encryption allows operations such as training a mode to be done on encrypted data, without needing to decrypt it first.

Secure Multi-party Computation enables parties to compute a function over their inputs while keeping those inputs private. We could then have a number collaborating to produce a model, without sharing their own private data.

Zero Knowledge Proofs bring 2 useful features

  • The zero knowledge aspect, that can provide privacy
  • Succinct verification of the correctness of computation.

This latter feature is the basis for zk rollup. based L2s such as Starknet.

Blockchain applications

Much of the interest in this area comes from allowing ML models to be implemented on-chain. However, at the moment running a model directly on say the Ethereum Virtual Machine is not viable.

Zero knowledge proofs can help with scalability here, we can process a model off-chain and create a succinct proof that the model produced a certain output correctly. This proof can then be posted on-chain and used by other smart contracts.

This article by 0xParc suggest 4 scenarios when checking the correctness of the output of a ML model.

  1. Private Input, Public Model:
    This situation involves a Model Consumer wanting to maintain the privacy of their input data from a Model Provider . For instance, an model consumer may want to authenticate the outcome of a credit score model without revealing the personal financial data involved.
  2. Public Input, Private Model:
    This situation is often observed in Machine Learning-as-a-Service scenarios. Here, the model provider may want to conceal their model parameters to preserve their intellectual property. Conversely, the model consumer wants to know that the model output did genuinely come from the requested model.
  3. Private Input, Private Model Here we deal with input data that is confidential, and the model details also should be concealed to protect intellectual property. A typical example is where healthcare data is involved. Zero knowledge proofs and fully homomorphic encryption are often used to achieve this.
  4. Public Input, Public Model: If both the input data and the model details can safely be made public, the use of zkML is for a different purpose, that of ensure the correctness of off-chain computation within an on-chain context. For long running computation it is better to verify a succinct zk proof of the computation rather than re run the computation for verification.

Opportunities and Innovations

These following possibilities demonstrate the potential for ML to innovate in the web3 space to increase privacy, security and engagement.

Potential applications include:

  • Trustworthy Off-Chain ML Oracles: The increasing integration of generative AI could encourage the adoption of digital signatures for in news articles, images or video. Off-chain ML models could leverage this verifiable data for making predictions or classifications, such as election results or weather forecasts. These models could serve as unbiased oracles for securely settling prediction market outcomes, insurance agreements, and more, by validating the inference and providing proof on the blockchain.
  • ML-driven DeFi Applications: DeFi could see increased automation through ML models. For example, a lending platform might use ML to dynamically adjust its parameters. Traditional approaches depend on off-chain models managed by organisations to determine metrics like collateral requirements or liquidation thresholds. Further decentralisation would occur if open-source models were developed and verified by the community.
  • A decentralised, Kaggle where participants supply models, whose accuracy can be validated without revealing the model’s specifics. This could allow model trading, and allow participation without the fear of models being copied.
  • Automated Trading Strategies: Traditional financial strategies are often presented through backtesting by market participants but we do not know that these are the models in trading. zkML offers a way for participants to show that their financial models were used in trading.
  • Smart Contract Fraud Detection: Rather than relying on slow manual governance or central authorities for contract management, ML models could instantly detect and respond to suspicious activities, potentially pausing contracts automatically. This is particularly relevant given that the recent uptick in Rug pull projects in the DeFi space.
  • Differential privacy and Identity solutions. Already a valuable use case for zero knowledge proofs, this could be enhanced with zkML.
  • Content Filtering for Web3 Social Media: Community-agreed ML models could filter social media posts based on agreed criteria, to reduce spam in social media.
  • Dynamic In-Game parameter adjustment: Sophisticated models could be used to adjust game parameters , to give more personalised and engaging gameplay. Furthermore we could see new types of games with for example cooperative human vs. AI challenges,

So what’s the problem ?

The major problem facing zkML is that of performance, adding zk to our ML calculations is an expensive overhead, particularly when training the model.

An interesting application that came out of a zk hackathon is the Zero Gravity project, instead of following the established design for neural nets, they picked up a ‘weightless’ approach that has proved easier to reconcile with the maths underlying zero knowledge protocols.

In a later article we will look at how FHE can also bring privacy to machine learning.

If you want to know more about zkML, we are teaching a bootcamp for developers. Apply here : https://www.encode.club/zkml-bootcamp

--

--

Extropy.IO

Oxford-based blockchain and zero knowledge consultancy and auditing firm