Transcribe Epicenter Interview with Gensyn Cofounders: Ben Fielding and Harry Grieve
Understanding the coordination of decentralized production relations to provide decentralized machine learning computing power
Source: Epicenter
Transcriber: Sunny
Personal Introduction
Ben:
My background is mainly in machine learning research, so I completed a PhD in deep learning focused on neural architecture search. This problem involves searching the space of deep neural network structures to find the most performant one for a specific task. I finished my PhD in
2019 and then transitioned into the startup world, co-founding a data privacy startup. I have a strong interest in individual data privacy and data sovereignty. I worked on that for a couple of years and later joined an accelerator program in London called Entrepreneur First, where I met Harry. We delved into what we're building with Gensyn.
Harry:
On my side, my background is in applied econometrics,
which combines economics and statistics. I was introduced to machine learning during my post-graduate studies for my Master's Degree in econometrics, and I fell in love with it. I found it fascinating to be able to quantify everything. The next step for me was leading a data research team at an AI startup in London. While Ben comes from a more technical and academic background, mine has been more on the applied side commercially.
I reached a point where I really wanted to build something in this space, as I saw many scaling issues. I joined the Entrepreneur First accelerator and met Ben. For those unfamiliar with EF or Entrepreneur First, it has been described as a combination of Love Island and Shark Tank. You join as an individual and find a co-founder, and they invest in you. It's a pre-idea stage. I met Ben, and we shared a similar vision for the future of AI and a similar sense of humor. The rest is history.
Enlightment: Blockchain as the Trust Layer of Decentralised AI Infrastucture
Host: So it seems like both of you come from extensive backgrounds in AI and deep learning. What moved you to integrate this with blockchain?
Ben:
Good question. It wasn't an instant decision; it happened over a relatively long period of time, to be honest. It was driven by technology. We knew we wanted to build massive-scale AI infrastructure. As we were researching how to achieve maximum scalability,
we realized that a trustless layer was necessary. We needed to be able to unite computing power without relying on centralized onboarding of new providers. Otherwise, we would encounter administrative scaling limits, and we didn't want any limits.
So we started exploring verifiable computation research to overcome this challenge. However, we reached a point where there always had to be a trusted third party or judge involved in checking the computation.
This limitation led us to blockchain. Blockchain provides a way to break free from the need for a single decision-maker or arbiter by enabling consensus among a large group of people. It was the "light bulb" moment for us, realizing that introducing a consensus layer through blockchain was crucial for achieving the planetary-scale AI we envisioned.
Interestingly, before this realization, we were somewhat skeptical about blockchain. We hadn't delved into the space before and had taken the typical technical approach of assuming that a read-only database could achieve the same outcomes. But for me personally, recognizing the importance of the trust layer was a profound moment. It revealed the true power behind blockchain, and I became deeply involved in the space.
Harry:
Interestingly, Ben and I shared many ideals commonly championed in the wider decentralization scene.
We both strongly valued free speech and were concerned about censorship, particularly in light of events like Snowden's revelations. We bonded over these discussions even before delving into the blockchain space. It almost felt obvious to us that we should start in the blockchain realm, although interestingly, we didn't initially.
Right before we made the switch, we were exploring Federated Learning, a field in deep learning where you train numerous models
across distributed data sources and then combine them to create a meta-model that can learn from all the data sources. We were working with banks on this approach. However, we soon realized that the bigger issue was accessing computing resources or processors on which these models could be trained. To address this challenge, a decentralized method of trust was needed, and that's where blockchain came into the picture.
Background: What is AI, Deep Learning, and Machine Learning?
Host: Okay, so basically, it was the platform and decentralized incentive
layer that influenced your decision to integrate blockchain. Before we dive into the specifics of what Gensyn does as a blockchain protocol, let's talk about AI first. While most of our podcast listeners are familiar with blockchain to some extent, AI might be less familiar to them. So let's discuss the current state of AI, which seems to be making significant advancements.
With GPT-3 and the upcoming GPT-4, along with innovations like DALL-E, it's truly mind-blowing. Can you both talk about the recent advancements in AI over the past few years?
Ben:
Absolutely. It's interesting to witness this explosion in the AI space, especially since the field of AI and machine learning has experienced a series of mini explosions over the past seven years. However, this current wave of advancements seems to be creating real impact and valuable applications that resonate with a wider audience.
Deep learning has been a fundamental driver of these changes. When I started my PhD, the deep learning explosion had just begun, particularly in the field of computer vision. Deep neural networks demonstrated their ability to surpass benchmarks set by traditional computer vision methods.
Previously, computer vision involved manually defining filters and detecting specific features or textures. Deep learning revolutionized this process by enabling models to learn directly from the data.
It eliminated the need for extensive expert knowledge and allowed individuals with sufficient computing power to design relatively simple models that could be trained on large amounts of data. This approach delivered the desired outcomes. The current focus is on building upon these foundations, developing more advanced models, and making them accessible to consumers and developers who may not have the specific expertise of the past.
GPT-3 and similar models have accelerated this progress. Harry, would you like to add anything regarding deep learning advancements?
Harry:
Yeah, whenever we talk to crypto or crowds about it at conferences, we always clarify the distinction between three terms: AI, machine learning, and deep learning. They are often used interchangeably, but they have significant differences.
The best way to visualize it is through a series of circles, like a Matryoshka doll. At the outermost layer, we have AI. By the loosest definition, AI refers to programming a machine to perform a task, such as a washing machine. It follows instructions programmatically. On the other hand, machine learning gained prominence in the '90s
and early 2000s. Instead of relying on expert systems with if-then rules, machine learning leverages data to determine the probability of a decision. Deep learning builds upon this concept but allows for more intricate modeling of different concepts. It involves hierarchical feature representation, where different parts of the model learn different things.
For example, in recognizing handwritten letters, a neural network goes through multiple layers to identify specific features like closed loops or stems. Over time and with computational cycles, the model can generalize and classify new images as numbers between zero and nine.
Deep learning is where we see the major breakthroughs, including GPT-3, DALL-E, and Stable Diffusion. Transformer models have been particularly influential, especially for large language modeling tasks since 2015 and 2016.
Additionally, from a social perspective, it's fascinating to consider the advancements made in generating convincing comic books from text prompts.
If you had told people in the early 2010s that this would be possible, many wouldn't have believed it. Similarly, in the next few years, we anticipate a similar leap in capability, especially at the consumer level. But in the next few years, we can expect a similar magnitude jump in capabilities. In the 2020s,
you could sit down in front of a platform like Netflix and, instead of choosing a pre-made movie, simply enter a text prompt like "I want to see three technologists having an hour-long podcast discussing AI." With additional prompts and initializations, an entire movie could be generated, and you might even have the ability to steer its direction at different points.
As a final point, you could potentially change the genre of the story, transforming something like Halloween into a sci-fi movie or turning Jurassic Park into a love story, all by modifying the rendering while using the same script. Exciting things are on the horizon.
Unlock Black Box: Deterministic Vs. Probablistic
Host: Can we discuss the paradigm shift behind this? In traditional programming, the approach is often deterministic, with a focus on if-then statements.
However, my understanding of neural networks, although limited, is that they utilize complex connectivity. Do people fully understand how decisions are reached within a neural network? Also, I assume that in your case, you're not using actual neural networks but rather simulating them on a regular computer.
Ben:
Not yet (meaning real neural networks).
Host:
This raises the question of how your modeling system differs from simply providing prompts to a computer. Can you elaborate on the distinctions and complexities of the system you're working with?
Ben:
Sure, I think the black box nature of deep learning models is due to their absolute size.
At the end of the day, you're still tracing a path through a series of decision points in the network. It's just that that path is absolutely enormous, and it's hard to kind of link the weights or the parameters within that model down to exactly why they're that sort of value because they've come to that value after being fed millions of samples. You could deterministically do that, you could track every single update, but the size of data that you would end up generating would be absolutely enormous.
I think there are two things that I see happening as we go through this.
One is the black box nature is sort of falling away a little bit as we start to understand more and more about the models that we're building. Deep learning is a kind of research area that has gone through an interesting fast period where there's been a lot of experimentation that wasn't driven by the sort of fundamentals of the research. It was more driven by
seeing what we could get out of it. So we throw more data at it, we try out new architectures, and we just see what happens, rather than starting from first principles and designing this thing and knowing exactly how it works. So there's been that kind of exciting period where everything has been black box. I think a lot of the gains that happened there are starting to slow down a little bit, and we're seeing people revisit those architectures and sort of check and say, "Why does this work so well? Let's dig into it and let's kind of prove it out." So in some ways, that kind of curtain is lifting.
The other thing that's happening, which is a bit more controversial, I guess, is the shift in people's perspectives as to whether a computational system needs to be fully deterministic or whether we can live in a probabilistic world. We live in a probabilistic world as people.
The self-driving cars example is probably the clearest, where when we're driving around, we accept that there are kind of stochastic events that happen and that there can be small accidents and there can be issues that happen with a self-driving car system. We don't accept that at all, and we say that this has to be a fully completely deterministic process. I think one of the challenges that the self-driving car industry has had has been an assumption that people would just accept that probabilistic mechanism applied to self-driving cars, and they haven't.
But I think that will change, and that's the probably the controversial bit as we, as a society, go towards actually allowing kind of probabilistic computational systems to exist alongside us. I'm not sure if it will be an easy road, but I think it will happen.
Define: Artificial Narrow Intelligence, Artificial General Intelligence, Artificial Super intelligence
Host: Yeah, thank you. Before we dive into the current landscape, there's one term I have come across often in preparing for this episode. Also, maybe that's a question for Harry because you already talked about the different kinds of machine learning, deep learning, and artificial intelligence. So basically, there's this term of artificial general intelligence.
Is that different from the three terms you already talked about?
Harry:
Yes, so it's a term which was popularized, I believe, by Ben Goertzel, who's an AI researcher and entrepreneur. The idea of AGI is similar to the concept of singularity. It refers to achieving human-level intelligence in machines. Currently, we have what you might describe as artificial narrow intelligence, where machines are good at performing specific tasks. For example, machines are very good at detecting
certain types of cancer from medical scans through pattern recognition. However, AGI aims to achieve general intelligence, where a machine can perform tasks that may be relatively simple for humans but challenging to reflect in a computational system.
Host: Can you give an example of AGI?
Harry:
Yeah, a good example would be a machine being able to navigate smoothly through a crowded area while making discrete assumptions about all the inputs around it.
It's one of the reasons that I can't remember the level of driverless cars, maybe it's level 10 or something. It's one of the reasons that driverless cars do really well on the motorway because it's a problem that humans might feel quite complex, but it's actually a simple mathematical problem due to the lack of variation. However, when you take that same car and put it in a city street in Rome,
with cobbles and tourists walking out in front of it, it becomes extremely difficult. Some of the things we think are difficult for machines, like being good at chess, are actually quite easy, but some of the things we think are easy, like walking down the street or understanding someone's body language and emotions from a conversation, can be challenging.
For example, combining various skills like pose estimation with understanding emotions and making decisions is difficult. Artificial General Intelligence (AGI) refers to a model or system that can perform everyday tasks as well as humans.
It is important to note that the advent of AGI can lead to artificial superintelligence, where machines surpass human abilities due to the complexity of their models, increased computational power, infinite lifespan, and perfect memory.
This concept is often explored in science fiction and horror movies.
Host: but this is what Elon is afraid of?
Harry:
Yes, you hear about it a lot, and these examples are often mentioned. One pathway that people, or at least I believe, will lead us there is the fusion of human and machines.
For instance, through a brain-computer interface or brain-machine interface (BMI), you can augment your lived experience with machine inputs. The machine learns from the patterns and workings of your brain, and in turn, helps you train your own brain. This process can accelerate the development of AGI. However, it also raises a host of ethical issues. That's basically the definition of AGI, and subsequently, artificial superintelligence (ASI).
Current Compute Landscape: AWS, Local Infra, and Cloud Infra
Host: Cool, that's super interesting. So let's take a look at what the current landscape looks like, right? So basically, if I want to run an AI model, where can I buy AI compute? I mean, I could just get an instance on AWS, or I could run it on my local machine. Can you walk me through the options?
Harry:
Yeah, so it really depends on the scale of the model you're training for. If you're a student learning about AI, maybe you're an undergrad, you'd typically just use AWS or, for small enough models, your local machine, as you mentioned.
The next level up, let's say you're a startup and you've just burned through your 100K of Amazon credits, and you're looking at the marginal cost of training models.
A. You might go for an on-demand AWS instance, or something more fixed and permanent, which is typically cheaper when you book them out in advance.
B. But there comes a point when you're training models that incur enormous costs in AWS.
There comes a point when training models that require significant GPU resources, where the cost and scalability limitations of AWS become a hindrance. At that point, companies often choose to build in-house infrastructure.
In our research, prior to raising our latest funding round, Ben and I spoke to around 150 machine learning researchers and engineers from various organizations, including large companies, startups, and academia.
Many academics at top universities have access to high-performance computing clusters, and companies like Facebook have their own superclusters, which is the biggest AI cluster in the world.
However, most people we spoke to struggled to achieve the desired scale. Some of them resorted to purchasing their own GPUs and managing them in-house. We heard stories of people setting up spare bedrooms with fans and multiple GPUs, resembling a Bitcoin mining operation.
Others had GPUs in their offices. It's a somewhat fragmented market.
However, the bottom line is that if you buy GPUs outright, it typically costs less in the long term compared to running them on Amazon AWS instances and paying the premium for access. So, the options are cloud-based, running AI models on your local machine, or setting up your own cluster.
In academia, there is access to high-performance computing, but there can still be bottlenecks. Another option is accessing benevolent compute networks, like Folding@home, using platforms such as BOINC from Berkeley. While not specifically related to machine learning but signal processing, projects like SETI@home serve as examples of distributed computing on a large scale.
Currently, Folding@home has the largest compute volume in the world, surpassing even supercomputers like Fugaku.
In summary, the journey can go from your local machine to the cloud, possibly through a high-performance cluster at your university, and ultimately back to on-premises infrastructure.
The goal of Gensyn is to provide equal access to compute scale that is currently available to those with on-premises clusters. Crucially, it aims to achieve fair access that can not be controlled or turned off by a centralized entity.
Designing Gensyn's Product and Why?
Host: There have been projects like this in the blockchain space before. One of the very old projects, by blockchain standards, is called Golem. I believe they actually did their ICO in 2016, which is basically like 50 years ago in the blockchain world.
So, how does Gensyn compare to Golem?
Harry:
Yes, great question. We think of it in terms of two axes.
Firstly, the thinness of the protocol. Golem is a general compute protocol that can be used for various tasks, while we are a thin protocol that focuses specifically on training machine learning models.
Secondly, the scalability of verification.
Earlier projects often relied on reputation or less Byzantine fault-tolerant methods of replication, which didn't provide enough confidence in the results for machine learning purposes.
We had conversations with machine learning experts who agreed with this assessment. Our goal was to leverage the initial learnings from compute protocols in the crypto world and apply them specifically to machine learning. We wanted to optimize speed and cost while ensuring satisfactory levels of verification.
The verification and consensus piece is a major focus for us, and we have made progress on it since our initial white paper. Ben, do you have anything to add?
Ben:
Yeah, I would like to emphasize the general-purpose approach that most projects have taken in the past. It's initially attractive to target the largest possible market by offering general-purpose computation for any computational problem.
However, this approach quickly falls into two traps, as Harry mentioned.
The first trap is the verification problem, which is extremely difficult. Our thesis is that a narrower focus is necessary, and we envision a hierarchical stack of thin protocols at the bottom of the decentralized infrastructure. Similar to AWS in Web 3, all the functionality will exist in this stack, with protocols like gensyn and render token handling specific types of computation efficiently and with strong verification.
On top of that, you can have general-purpose compute networks that rely on these underlying protocols. This is our vision for decentralized infrastructure. When launching a thin protocol, it becomes much easier to target a specific market.
For Gensyn, our market is not focused solely on building machine learning models like chess simulation. It can be tempting to expand into other areas or attach ourselves to popular trends, such as NFTs, but that splits the mindshare and confuses the product offering. We want to maintain a clear identity as a machine learning compute protocol. If that's what you need, you come to Gensyn. If you need something else, you go to a different protocol that may eventually utilize Gensyn in the backend.
Our long-term goal is to be behind the scenes, like HTTP for machine learning compute. End users and developers won't even be aware of Gensyn's existence. ute to the world.
They will simply experience a changed world where training machine learning models happen seamlessly through a series of apps and dApps until it eventually reaches the gensyn protocol. We believe this hierarchical infrastructure is the best way to provide compute to the world.
Harry: I'd had one final point to that which is
which is that when we consider the properties that the network must have, it needs to be targeted towards machine learning engineers and researchers.
It needs to have the verification piece, but crucially, on the permissionless side, it needs to exhibit both censorship resistance and an agnostic relationship with hardware. In the deep learning hardware space, which is dominated by companies like Nvidia,
There are companies developing their own proprietary ASICs, such as Google with their TPUs (Tensor Processing Units) and Graphcore with their IPUs (Intelligence Processing Units).
However, a trap that some protocols, even outside of deep learning, have fallen into is shipping proprietary hardware. The idea of shipping one's own hardware may seem attractive, as it can solve issues related to re-running proofs and ensuring determinism for hashing, but it ultimately creates a choke point of centralization.
We have observed other computer protocols relying on secure enclaves, like Intel SGX,
where they claim to perform computations in a private manner. However, this approach requires specific chips manufactured by specific companies and is only rentable on specific services. It does not align with the decentralized ethos, and it currently does not scale well.
Constraints and Assumptions
Host: That's correct. Gensyn's offering allows for the utilization of underutilized resources, which means it can make use of existing resources that are currently idle.
This is different from having to purchase dedicated hardware specifically for participating in the network.
Ben:
Exactly. I think, like Harry said, it's really attractive to go that route from a technical perspective because it's so easy. But I think it intersects with one of the biggest things that we think about when designing our verification system, which is how are we constraining the system and what assumptions are we making?
Because essentially, we have to make some assumptions and put some constraints, but a constraint like that to us is massive. We don't want to do that unless we absolutely have to. There are other things we can do, such as narrowing the space of devices temporarily or permanently, looking at certain manufacturers, or using certain libraries that provide determinism. But every time we make any decision like that, we do it deliberately.
It's quite easy to jump over those in the rush to ship something, but if you're going to build the network that we want to build, which turns the entire world into an AI supercomputer, you have to be very deliberate about it. It may take slightly longer, but it's a step change. It's almost zero or one. If you make those assumptions, you won't reach that end state.
It fits on three axes: product assumptions, research assumptions, and technical assumptions. Balancing all of those is uniquely tricky, and you need voices from each area equally valid in the company. We've focused on hiring to ensure we don't accidentally overweight a certain area.
Some protocols we've looked at before have fallen into traps. There are traps with research where you can go down the path of making the most formally verifiable system, but never ship anything. On the other hand, you can go the route of making the flashiest thing that end users like, quickly shipping something, but in the Web 3 world, that's not as good because any issues or breaks become significant problems.
So, on Web 3, it's like walking on a ridge, where attractive-looking paths on either side quickly drop off the cliff. We're being very careful to stay on that ridge.
Technical Constrains and Assumptions: Layer1, Rust, and Parachain?
Host: Before we delve into the intricacies of the protocol, let's discuss why Gensyn is built as its own layer one blockchain.
In principle, it could have been developed as a DApp on another chain. So, why did you choose the layer one route?
Ben:
Yeah, it was a significant question for us initially. We approached the blockchain world with a focus on technology, and we carefully considered various options for building Gensyn. We weighed the pros and cons of different approaches and made a comprehensive list. Ultimately, we moved from Layer Two to layer one because we wanted the freedom to make changes at the layer one level, particularly in terms of the consensus mechanism.
We didn't want to be limited by a specific smart contract system and wanted the flexibility to explore different possibilities. Being a layer one blockchain allows us to do more work on the Node side, which wouldn't have been feasible with an EVM-based and solidity-written solution.
By building in Rust, we can leverage certain capabilities such as machine learning processes and tensor processing that are not readily available within the EVM. Essentially, it was a decision to future-proof our protocol. We didn't want to impose unnecessary constraints early on when we didn't fully understand their implications. We kept our approach open and adaptable.
Additionally, we believe in a multi-chain future. We envision a landscape where individual chains interact with each other through a generally agreed messaging protocol. Rather than having ecosystems filled with isolated chains, we see the potential for chains to collaborate and communicate. We have observed movements in the ecosystem towards developing their own messaging protocols, followed by a shift towards more general message passing. We are optimistic about the multi-chain future and feel that our decision aligns well with this vision.
Host: So you're looking at building this as a parachain. Why put it on Substrate in the Polkadot ecosystem?
Ben:
So we're not fully certain whether we'll be a parachain or not yet. The Substrate decision was essentially based on the technology. When we looked at everything, including the frameworks and libraries available, Substrate stood out as the best option for us.
As machine learning people enter the blockchain space, we wanted to leverage existing technology and focus on building our machine learning capabilities rather than reinventing consensus. Substrate allowed us to quickly iterate and build our chain, then get on with the off-chain stuff with enough flexibility for future changes
It's written in Rust, which we're fans of as a language. About a year and a half ago, Cosmos and Substrate were the top contenders, and Substrate won due to its strong technical foundations, developer tooling, and libraries.
The decision to become a parachain will be made later. We can choose to be a parachain or not, and it depends on factors like the ecosystem filling up with compatible components such as storage layers or sovereign data layers.
If those components exist elsewhere, we may consider using IBC to interact with them and exist in the wider world. We'll see how things develop.
Host: You guys should look at solutions like Katisi as well. These are solutions that enable you to have a legacy operating system that connects to a blockchain for provable compute. It's super interesting.
Ben:
I was just going to say it sounds interesting. I haven't come across it before, but yeah, we'll definitely check it out.
Gensyn Machine Learning Compute Protocol Works Off-chain
Participants: Submitters, Solvers, Verifiers, and Whistleblowers
Host: So let's dive into the protocol.
In the gensyn economy, there are three main roles: submitters, solvers, verifiers, and whistleblowers. The submitters are actually the people who want work done. Now, if I were a submitter, what kind of AI problems could I submit? Am I constrained in any way?
Solvers: Gradient-based Optimisation as the Core
Ben:
Yes, currently you are constrained by your AI problem having to use gradient-based optimization at some point in the computational process. Basically, we utilize portions of the gradient calculations as part of our proof system.
As Harry mentioned earlier, we have our lite paper which is public, and we are iterating on that internally. There are lots of things in play, essentially. But for now, it's gradient-based optimization, and we use the signals from that as part of the verification mechanism. Other than that, there are no specific constraints.
Host: What does gradient-based optimization mean?
Ben:
I guess fundamentally, if we think about a neural network, it is a big set of layers that have parameters in them, and those parameters are essentially just real numbers.
There could be millions, billions, now trillions of those numbers there, but fundamentally, they are the deciding factor in the output of the network. The training of the network involves setting those parameters to realistic values that allow data to go through and trigger the desired outputs at the end of the network.
You go through lots of matrices, layers of these real numbers, and they change the current input as it's going through, and then you get the output that you want based on all those changes that have happened.
In the past, this process would be done manually or using expert knowledge to set the parameters. But with neural networks, there are different ways of setting them programmatically.
One naive approach is randomly setting all the parameters, running a sample through, checking how far it is from the desired output, and then repeating the process of random updates until a smaller error value is achieved. There are more targeted strategies as well, where you perform updates based on gradients.
Gradient-based optimization was a big change for neural networks and deep learning. It showed that you could use the gradient, or differentiate the parameters of the layer with respect to the error, as you go through the network. By applying the chain rule, you can propagate the gradients all the way back through the hierarchical network. In this way, you can determine the position on the hill of loss surface.
If you imagine modeling the loss as a surface in Euclidean space, you would see it as a bumpy area with dips. The goal is to find the dip where the loss is minimized.
The gradients for each layer show you where you exist on that surface and in which direction you should update the parameters. You use the gradients to navigate the bumpy surface and find the direction that leads to a bigger loss.
The size of the step in the update depends on the steepness of the surface. If it's steep, you make a bigger jump, and if it's not steep, you make a smaller jump. Essentially, you're just navigating this surface, looking for a dip, and the gradients help you determine your position and direction.
This was a huge leap because the gradient provides a clear signal and useful direction, rather than taking random leaps in the parameter space. It helps you know where you are on the surface and whether you're on top of a hill, in a trench, or on a flat area.
Host: How do you determine that there's only one trench and how do you ensure that you're in the right trench? Because if there are multiple trenches, you want to end up in the deepest one. You don't want to get stuck on a smaller mound.
Ideally, you want to reach the peak, like Mount Everest. So, how do you ensure that you go as low as possible with your model?
Ben:
Very good question. That's one of the big problems in deep learning itself. Essentially, there are many techniques for addressing it. The simple answer is assuming it's convex, and then you don't have to consider the existence of any other trenches.
However, in the real world, it doesn't work like that. There are numerous regularization techniques used in deep learning training, making it a complex and more of an art than a science. Many practitioners have their own tricks and approaches. One such technique is using learning rate schedules, where the learning rate decreases over time, ensuring smaller jumps in the gradient space and preventing accidental jumps over a trench.
Conversely, there can also be random introductions of large jumps, allowing exploration of potential better solutions in different areas. If the result is not satisfactory, it can be rolled back to the previous state.
These techniques involve a fair amount of trial and error rather than being deliberate. However, there is a growing trend towards more deliberate approaches. As people introduce regularization techniques like dropouts and norms, they are revisiting them to understand their underlying reasons and whether they truly work as intended.
In essence, it's more of an art than a science, but efforts are being made to better understand and explain the mechanisms behind these techniques.
Harry:
It can be very frustrating.
Host: Now I understand that gradient optimization problems are what I should submit. In terms of real-world problems, could you please elaborate on which ones are gradient optimizations and which ones aren't? This will give me a better idea of the types of problems I should be able to submit.
Ben:
Yeah, I mean the simplest way to put it is that almost every neural network uses gradient-based optimization.
There are other problems that also utilize it, but within the context of neural networks, all the major advancements and breakthroughs have been achieved through gradient-based optimization. It's a logical area for us to focus on while still leaving room for other possibilities. Any optimization problem that is differentiable and can use the chain rule to propagate gradients can potentially benefit from gradient-based optimization.
Some problems use other optimization techniques in conjunction with gradients as a signal, but as long as a gradient is being calculated, we can leverage it. It's a valuable tool. However, fundamentally, it all comes down to neural networks. Every couple of years, someone publishes a paper claiming that they're training neural networks using evolutionary optimization without gradients, and that it's better. But in reality, it's never better. It may perform well in specific, limited scenarios, but it never gains traction.
That's not to say it will never happen, but so far, gradients have remained the dominant approach.
Host: Okay, then let's turn the question around. What problems cannot be submitted? Which problems are not easily solvable or not well suited for neural networks?
Harry:
In general, neural networks are generally quite data hungry algorithms, so if you have a problem with very low data volume such as toy examples like the IRS dataset with a small number of rows and features,
neural networks may not be well-suited for them. Instead, statistical machine learning techniques are often better suited for such cases. Another consideration is that certain types of neural networks can be very large, making it challenging to fit them on edge devices.
However, when it comes to the type of problem you're trying to predict in the world, there isn't something that immediately comes to mind where neural networks are explicitly considered to be bad at. Do you have any intuition about this, Ben?
Ben:
I guess you can think of a neural network as a universal function approximator. So theoretically, it can perform all the tasks that you would do with other methods.
However, as Harry mentioned, the reason for not using a neural network would typically be related to data volumes. In some cases, a statistical machine learning method could provide better results. In those instances, you wouldn't train the neural network using gradient-based optimization. But fundamentally, you could still use a neural network if desired, even though it might yield slightly worse results.
Submitter: Training Data and Model
Host: I understand that I can submit almost any question, so basically you say do I ask for an entire program or
Do I ask for something like I want a picture of accountants in hot air balloons over waterfall and there should also be a rainbow with scorpions on it and it will do that for me or can I ask I'm building I'm building this car and I need an AI to drive that car . Can you deliver that AI? Is that kind of the both within the scope or do one of those fall out of scope?
Harry:
So I think the process would be to use Gensyn to train the model itself.
You would start by defining your desired outcome, such as generating scorpion rainbow images from a text prompt. Then, you would build a model that takes a text prompt as input and generates corresponding images. The training data would be crucial for the model to learn and improve. Once you have the model architecture and training data prepared, you will submit them to the Gensyn network along with hyperparameters like the learning rate schedule and training duration.
The result of this training process would be the trained model, which you can then host. From there, you can submit a text prompt like "rainbow" to the hosted model and receive the generated images.
Host: how do I decide which untrained model to use?
Harry:
That's an excellent question. There are two approaches to consider. The first one is based on the concept of foundation models, which are currently gaining popularity.
In this approach, a large company like OpenAI or Midjourney builds the base model. Then, you can take this base model and train it on your specific training data, which may include examples of rainbows and scorpions. By training the model on this data, it becomes proficient at generating similar outputs. This option is commonly chosen by those with limited computational resources.
The second option is to build the model from scratch, deviating from the base model approach. Ben can provide more insights on this aspect from our perspective.
Ben:
Yeah, a lot of our thinking revolves around the foundation models approach as we believe it represents the future of the space. During my research for my PhD, I focused on AutoML techniques, which aim to optimize model structures and find the best architecture without requiring expertise in the field.
This approach has been adopted by platforms like AWS SageMaker and GCP's Compute Cloud, where they incorporate AutoML techniques to simplify the machine learning process for developers. For Gensyn, as a protocol, we envision it being utilized by DApps that implement evolutionary optimization techniques or similar methods.
These DApps would submit individual architectures to the Gensyn protocol for training and testing, iteratively refining the structure to build the desired model.
In our vision, gensyn serves as a foundation for pure machine learning compute, and we encourage the development of an ecosystem around it.
All the additional features and functionalities seen in platforms like SageMaker and GCP can be built on top of gensyn. While it may seem attractive to build them individually, we believe it's important to avoid falling into the trap of proprietary solutions. Regarding foundation models, we've observed researchers taking pre-trained models from large research papers that invested significant funding into exploring various architectures.
They publish these models, claiming they excel in specific computer vision tasks. Users can then take the pre-trained model, add or remove layers, and fine-tune it on their specific data. This approach, known as pre-training and fine-tuning, is common in the deep learning space.
However, we recognize the challenge of introducing bias through pre-training, as organizations may use proprietary datasets or withhold information about the training process, making it difficult to understand the decision-making process.
Our solution to this bias is not to eliminate the black box or rely on full determinism, but rather to open up the training process to everyone. By collectively designing and training foundation models on an infrastructure that is not owned by any specific entity, we can create global models that are not biased by any particular company's dataset. [federated learning as second layer]
Once we have these global foundation models, anyone can utilize them by finding the model's hash on the chain, continue training from that point with their specific dataset, and have a model that is as biased as the entire global population rather than being biased by a single company in California.
The Compute Supplier: Verifiers
Host: Okay, so basically, until we have the global Foundation model (we can discuss how we plan on delivering that later), I need to decide on one of the commercially available models. Once I've submitted my problem, I'm curious to know who gets to work on it. Do servers need to meet certain prerequisites? Additionally, I'd like to understand if it's one server per problem or if parallelization is possible.
Harry:
Sure, I would say at the task level, it's one server per task.
However, a model can break out into multiple tasks.
When large language models are trained, they are designed to utilize the maximum hardware capacity available at the time. This concept can be extended to the network, taking into account the heterogeneity of devices.
For a given task, a compute supplier, such as a verifier or worker, can choose to take it from the mempool. They are randomly selected from the pool of individuals who have expressed their interest in taking on that task. If the model and data cannot fit on a particular device, but the device owner claims it can, there may be a penalty due to system congestion.
In essence, if a task can fit on a machine, the ability to run it is determined by a verifiably random function that selects a worker from a subset of available miners or workers.
Submitter verifies Verifier
Host: How do you verify the capacities of the miners? For example, if someone claims to have a 16-core GPU and 400 gigabytes of RAM, how is this information verified?
Ben:
In the verification of the computation they
won't be able to do the computation if they don't have that compute device essentially or that capacity. If they don't have the required capacity, they won't be able to complete the computation, and it will be detected during the proof submission. However, there is a question regarding the size of tasks. If tasks are made too large, it could potentially lead to system issues, such as a denial-of-service (DoS) attack, where miners claim tasks but never complete them, wasting time and resources.
Therefore, the task size decision is crucial and involves considering factors like parallelization and optimizing the task structure. We are actively researching and exploring the best approach based on various constraints. When we launch our test net, we will also consider practical aspects and observe how the system functions in the real world.
We understand that defining the perfect task size is challenging, and we are prepared to adapt and make adjustments based on real-world feedback and experiences.
Zero Knowledge Proof to Shield Verification System
Host: If a specified model and the training data how can you make sure that I've actually
done the job right because it's not very deterministic, so it's not like you can make me do a hash and then the hash will tell you whether I've done it or not how you build checkpoints into this into this process because otherwise I could just pretend to do the work and this was this was a lazy model . It kind of didn't do the work, it kind of it's maybe stupid I don't know I've done it, but it just couldn't be taught
Ben:
yeah
Essentially, that's the major challenge faced by the verification system. It is a significant hurdle to overcome. The simplest and most secure solution is to utilize a zero knowledge proof of the entire computation. In an ideal scenario, we envision that in the future, any computation can be accompanied by a zero knowledge proof, enabling us to definitively determine whether the computation has been performed or not. However, at present, we are still working towards achieving this capability.
Checkpoints
Host: Don't you need a new circuit for each given computation?
Ben:
Currently, the challenge of verifying machine learning computations is daunting. The computations involved are massive, requiring a domain-specific language (DSL) to define a circuit that represents the machine learning computation. It's a complex process. Our approach is a hybrid between the probabilistic mechanism and the use of checkpoints. We have followed some principles outlined in a paper called "Proof of Learning" by Nicholas and his group.
This paper introduced the concept of generating a certificate proof by traversing the gradient space, making it equally challenging to generate a realistic path as it is to perform the actual computation. By incorporating a financially rational assumption about the participants, we can infer that they would indeed perform the work. While there were some issues and flaws in the paper, it demonstrated that a relatively robust check can be established by combining a random auditing scheme with the gradient space path.
To enhance the verification process, we introduce zero knowledge proofs at specific steps and on top of the global loss of the model.
All of these components are packaged within a game theoretic mechanism resembling TrueBit, incorporating staking, slashing, and addressing the verifiers' dilemma through random jackpots essentially with whistleblowers.
That's the overall system in a nutshell, and I'm happy to delve into specific aspects if you'd like.
Host: Yes, that's a fair summary. Prove-it is a mechanism that enables performing large computations on-chain and then providing a proof of the computation's correctness. It uses binary search on-chain to verify various elements of the computation.
Ben:
Yeah, I think that's exactly it. TrueBit demonstrated that you can take a large computation that wouldn't fit in the EVM (Ethereum Virtual Machine) or would be extremely expensive and perform it off-chain. Through the challenge mechanism and the search process you described, eventually, the proof is generated and validated on-chain with the chain performing a small operation. We apply a similar principle but incorporate it with the certificate proof concept mentioned earlier.
For a full machine learning training job, performing the entire proof on-chain would be impractical due to its enormity. Instead, we distill it into a smaller proof that still represents the larger computation. For example, we can perform checkpoints at certain intervals, reducing the size by a factor such as one in a hundred. We also draw inspiration from work in the machine learning space that applies the TrueBit mechanism to neural networks.
Instead of using virtual machine instructions, it uses a graph representation of the neural network's operations, traversing a Merkle tree graph. This can be done at different granularities, such as native operations within frameworks like PyTorch or TensorFlow, matrix multiplications, and even individual floating-point operations.
Although there is overhead involved, this approach establishes the crucial link between off-chain participants and the full consensus of the chain. Being a layer one blockchain, we also have the ability to increase the computation size that the chain can handle. For instance, if the chain can perform matrix multiplications, we can bypass the step of performing floating-point operations within the matrix multiplication, which is advantageous.
Of course, it's important to consider constraints and assumptions, such as hardware requirements for validators, but we appreciate the flexibility to adjust these factors.
Whstleblower
Host: I completely understand. So, essentially, by fixing the block gas limit and potentially repricing certain opcodes, it can indeed make a significant difference. Perhaps we can discuss how the blockchain itself works in a moment.
However, there are two additional parties involved in the process: the verifier and the whistleblower. The verifier's role is to ensure that the checkpoints have been properly checked, while the whistleblower's role is to ensure that the verifier performs their duties accurately. Could you provide more details about their specific responsibilities?
Harry:
Yeah, I understand. So, the verifier and the whistleblower have a relationship similar to the verifier and the worker in the Tribute paper.
The whistleblower solves the verifier's dilemma problem, which means that they ensure the verifier's work is correct and can be trusted. The whistleblower is incentivized to do so by the verifier's forced errors. In other words, the verifier intentionally introduces errors in their work, and the whistleblower's role is to identify and expose those errors. This creates a system of checks and balances, ensuring the integrity of the verification process.
Host: Yeah, that's an interesting analogy. It's somewhat similar to the concept in the TrueBit paper. Like the dog at the baggage carousel, where if they don't find any drugs, their handlers will place a suitcase with drugs so they don't become disheartened and continue working.
In the verification process, the verifier intentionally introduces errors to test the vigilance of the whistleblower and ensure the system's effectiveness.
Harry:
That's an interesting point about the dogs needing dog treats occasionally. So, the thinking behind it is that the solver performs the work, and if the work is incorrect, the verifier detects the error and notifies the whistleblower.
The error is then recorded on the blockchain, which we'll discuss in a moment, to be verified on-chain. Periodically, and at a rate linked to the security of the system, the verifier intentionally introduces errors to keep the whistleblower engaged. If the whistleblower discovers a problem, they engage in a game called the pinpoint protocol, where they narrow down the computation to a specific point in the Merkel tree of computations for that area of the neural network.
This information then goes to the chain for arbitration. This is the simplified version of the verifier and whistleblower process, which we have further developed and researched after closing our seed round.
Gensyn Coordination Protocol Works On-chain
Host: So, let's talk about how all of these components fit together. First, someone has to build blocks on the network.
This process involves staking tokens, as it's a staking network. Now, how does all of this relate to the gensyn protocol?
Ben:
Yeah, essentially, it's a vanilla substrate blockchain to an extent. We utilize the proof-of-stake Grandpa Babe consensus mechanism, with validators operating in the usual manner. All the components described by Harry and me earlier occur off-chain, with various off-chain participants performing their respective tasks.
They are incentivized through staking, done via a staking palette within substrate or by submitting a specific number of tokens in a smart contract. These participants are rewarded when their work is ultimately verified.
The challenge lies in ensuring that the staking amounts, potential slashing amounts, and reward amounts are balanced, so there are no incentives for laziness or malicious behavior.
Adding more participants, such as the Whistleblower, complicates matters, but their presence is crucial for ensuring the verifier's honesty, given the scale of the computations. While we continuously explore ways to potentially eliminate the Whistleblower through zero-knowledge proof techniques,
We remain cautious and don't want to jump ahead prematurely. Currently, the system aligns with what's described in the lite paper, but we are actively working on simplifying each aspect.
Looking at past protocols, we often see a trend of launching with a complex system and then simplifying it once it's live. We anticipate going through a similar process. For example, in Polkadot, the fisherman mechanism was removed after the launch when it was deemed unnecessary.
Harry:
I'd also like to mention an additional point regarding our augmentation of the vanilla substrate chain. There is a challenge within the verification system, both in its original proposal and its current state, particularly in our state-of-the-art implementation. The issue arises when the data required to perform the initial support from the solver is removed or becomes inaccessible during the verification process.
This creates a standoff situation because if the verifier cannot access the data, they are unable to carry out the verification process.
Host: You have some kind of data availability solution that kind of plugs into it.
Henry:
recisely, yes. So we have incorporated the proof of availability (POA) layer on top of the substrate. This layer, internally referred to as POA, utilizes erasure encoding and other techniques to address the limitations we encountered in the wider storage layer market. If there are any developers in this space who have already implemented such a solution, I would genuinely be intrigued to learn about it.
Essentially a layer wherein you can lock data for a period of time in a way which is pinned unpinnable for about period of time and verified on chain that exists there. It's too expensive on Arweave. Arweave is the answer, but the cost for if you think about knowing a terabyte of training data being stored forever on Arweave, which is why our solution is not suitable. Inexpensive storage is a critical factor to consider.
Ben:
Yes, that's correct. Our need for Arweave extends not only to the training data but also to the intermediate proof data, which doesn't require long-term storage. For instance, it may only need to be retained for about 20 seconds while we progress through a specific number of block releases. However, with Arweave, we are currently paying for storage that spans hundreds of years, which is unnecessary for these short-term requirements.
What we are seeking is a solution that offers the guarantees and features of Arweave but at a lower cost, considering the shorter storage duration. Thus far, we haven't come across such an option.
Harry:
Yeah, you can think of it as a temporary or transient version of the Perma web. It serves the purpose of storing data temporarily for a specific duration, rather than indefinitely.
Gensyn Token and Governance
Host: I assume that there's gonna be a gensyn token somewhere in this eventually tell me about that.
1:14:30 Ben:
Yes, sure. So, the gensyn token is fundamentally essential for the ecosystem. It plays a crucial role in everything we have just described, such as staking, slashing, providing rewards, and maintaining consensus. Its primary purpose is to ensure the financial rationality and integrity of the system.
We use a modest inflation rate to pay out validators and leverage the game theoretic mechanism. However, it's important to note that the token serves purely technical purposes. We are very intentional about its deployment, bringing it in only when it is technically necessary and not before.
We have observed the pitfalls of launching utility tokens prematurely, which can lead to distractions and misalignment. Our goal is to avoid such complications. Ideally, the token will be introduced discreetly and at the right time to support consensus and incentivize participants in the system.
Harry:
Yeah, it's important to recognize that we are part of a minority within the deep learning community, especially in relation to the broader skepticism surrounding cryptocurrencies.
Ben and I, based on our history, were initially skeptical as well, but we have come to appreciate the technological and ideological aspects of crypto. However, when the network launches, we anticipate that the majority of deep learning users will primarily transact in fiat currency, with the conversion to tokens happening seamlessly behind the scenes.
On the supply side, solvers and participants will be actively involved with tokens, and we have received significant interest from former Ethereum miners who possess substantial GPU resources and are seeking new opportunities. It's crucial to ensure that the intimidating crypto terminology, like tokens, is removed from the user experience of deep learning and machine learning practitioners. This is an exciting use case that bridges the worlds of Web 2 and Web 3, as there is an economic rationale and the necessary technology to support its existence.
Now, it's a matter of execution, such as making users comfortable with the idea of not relying on centralized platforms like Amazon and finding decentralized ways to obfuscate the concept of variable token prices. It would be tempting to provide a centralized API frontend that automatically converts the tokens on a centralized exchange, but that approach brings its own set of problems, and we aim to avoid them.
Gensyn Roadmap
Host: What does the roadmap look like?
Harry:
Yup, the test net is planned for early next year, and it won't have any incentive initially. As mentioned earlier, the purpose of the test net is twofold: to battle-test the technology we've been developing internally and to gather feedback on its overall usability.
This will precede an incentivized test net, where users will be able to train models more extensively. We prioritize the pace at which we progress and ship updates. While we could quickly release something that might appear promising but lacks meaningful feedback, we are mindful of falling into that trap.
In the past, there have been instances where there was hype around certain ideas, like generative NFT art, and we could have provided a quick inference solution for it. However, we decided that it deviated from our core principles and didn't address the fundamental problem at hand. Additionally, it would require building several ancillary components.
Hence, we're not in a rush to release something immediately. Instead, we prefer to release a solution that is meaningful and takes the necessary time, especially considering the complexity involved in aspects like zero knowledge proofs.
Host: Where can people
go to learn more about gensyn?
Harry:
Yes, gensyn.ai is our main source of information. We have an active Discord community where a lot of discussions take place. Currently, we don't have a Telegram group. If anyone is interested in joining us and being a part of building a permissionless deep learning compute protocol, we are currently hiring and would love to hear from you.
Additionally, next year we will be hosting a zero knowledge machine learning summit, which would be of great interest to those exploring the intersection of these fields. Lastly, as a note to traditional deep learning and machine learning professionals, if anyone is attending or sponsoring the Europe's conference in New Orleans next week, both Ben and I will be there representing our work and the crypto space. We would be thrilled to have a chat if anyone is attending.
Source:

