Inside DeepMind’s four-year mission to solve one of biology’s greatest challenges

The Google research unit's latest breakthrough will transform our understanding of life itself. How was the technology developed and who will benefit from it?

By Oscar Williams

(Photo By DeepMind)

It takes the world’s best structural biologists several years to determine the precise shape of a protein molecule. The work requires patience and deep pockets, but it also affords scientists a far better understanding of cells, the diseases that afflict them, and the medicines that can protect them. For decades, scientists have known that a faster way to calculate how proteins “fold” – how they assemble into big, useful structures from chains of simpler molecules – could have huge benefits for public health.

On Monday 30 November, DeepMind – a British artificial intelligence (AI) firm owned by Google’s parent company, Alphabet – announced that its scientists had made the most significant breakthrough yet in the 50-year quest to improve protein modelling: an algorithm that can accurately predict how proteins fold almost as well as lab-based experiments. But rather than taking months or years, DeepMind’s AlphaFold neural network – a form of “deep learning” algorithm – can complete the task in a few days.

The announcement marks one of the most significant developments in the history of computational biology, and of AI itself. It may have profound implications for the future of healthcare, not only because it will change the way drugs are developed, but because it will change who develops them. As biology moves from the laboratory to a computational environment in which the answers can only be provided by AI, companies such as DeepMind and Alphabet may become central to medical science, with potentially far-reaching implications for global healthcare.

But for John Jumper, the AlphaFold project lead, Monday’s announcement was just the latest in a long line of extraordinary events he has experienced since he joined DeepMind in 2017.

***

A fast-talking American in his mid-30s, Jumper is perhaps an unlikely character to have made a major breakthrough in biology. Like all DeepMind employees he has a formidable academic record – the company has a reputation for hiring recent PhD graduates away from academia with starting salaries of £250,000 a year – but his background is in maths and physics, rather than biology or chemistry. He moved to the UK in the Noughties to complete a masters in theoretical condensed matter physics at Cambridge and it was there, in the late 2000s, that he started researching small molecules and their energies before returning to the US for a PhD in theoretical chemistry at the University of Chicago.

When he joined the AlphaFold project in 2017, Jumper was one of only two members of the now 20-strong team to have what he calls a “bio background”. All of the team’s other researchers, he told me from his home in St Albans, are physicists and computer scientists.

This meant that at the outset, few of the people working on AlphaFold had a very detailed understanding of what a protein was. Jumper was faced with a question: “should even be trying to teach people what a protein is?” What he discovered, however, was that by working for a year or more on AlphaFold as a computing problem, his colleagues learned a great deal about proteins. “Mostly they had become structural biologists by that point anyway.”

This is typical of how DeepMind approaches the problems it picks up – as ways to further its groundbreaking research into neural networks. Google paid £400m for DeepMind in 2014, and DeepMind’s office in Kings Cross is now regarded as the most advanced artificial intelligence research lab in Europe, if not the world. Its mission statement is simple but extreme: to “solve intelligence”.

Computer scientists remain divided as to whether a “general artificial intelligence” – a sentient machine – is any closer, if indeed it is possible. But few would argue that what DeepMind has achieved in recent years isn’t revolutionary.

***

DeepMind first came to the world’s attention five months before Jumper joined the organisation in 2017, when a programme called AlphaGo defeated the best player of Go, the world’s hardest board game.

DeepMind’s founder and chief executive, Demis Hassabis, learned chess when he was four and was playing professionally by the time he was 13. He said that Ke Jie, the human opponent to his machine, had played a “perfect” game of Go. But while Ke Jie had, as Hassabis put it, pushed the AlphaGo software “right to the limit”, he lost the first two games in the best-of-three competition.

This breakthrough had arrived a decade earlier than expected. It had been driven by a neural network that studied patterns in other players’ performances and devised new strategies that weren’t to be found in any previous game. It showed the formidable power of a “deep learning” machine – but it had no immediate applications beyond the game of Go.

The AlphaFold project, however, will have significant real-world consequences. Accelerated insights into the ways proteins fold will give scientists a greater understanding of the underlying chemistry of cells. This level of knowledge is necessary to know the molecular processes that lead to diseases such as Alzheimer’s, Parkinson’s, cystic fybrosis and cancer, and to develop drugs to slow or prevent them.

[See also: How do machines think?]

AlphaGo is an example of a branch of AI called deep reinforcement learning. Researchers trained the algorithm by showing it very large numbers of possible scenarios, and rewarding it for recognising patterns and taking more sophisticated and effective decisions, until the point where it surpassed a human’s ability to “strategise” in the same way. This is of course a very brief description of a much more complicated and expensive process.

Jumper says that while the systems are “quite different”, there are “ideas you can trace between AlphaFold and AlphaGo”. Like AlphaGo, AlphaFold’s neural network spots patterns, but it doesn’t depend on deep reinforcement learning to make decisions.

This is because a protein is not like a Go board. As a complex, moving, three-dimensional molecule, a protein is not, as Jumper puts it, “an environment an algorithm interacts with”. And unlike a Go game, which begins simply and becomes more complex, protein folding is more a process of deduction: “we know what the answer is supposed to be”.

There is also the fact – especially pertinent in a global pandemic – that AlphaFold is more than just an exercise. “Our goal is really to find something that works. It’s not about solving it in the right way […] It’s how in the world are we going to solve this very, very difficult scientific problem, and we take a very pragmatic approach to doing it.”

***

It was in the autumn of last year that Jumper and his colleagues began to see advances in the performance of AlphaFold that were “quite scarily good”. An earlier iteration of the software had already been entered into the 2018 edition of Casp, a biennial competition for protein-folding prediction models, and had performed well.

“We came top, but it’s not our goal to be good in a relative sense,” says Jumper. “We wanted to be really, really useful, and we wanted to have lots of applications.”

Working directly with Hassabis, Jumper and his colleagues sought to devise a system that had what “we started internally calling ‘biological relevance’ – which should help people do better experiments. They should learn more about the world because they see our predictions. We worked very hard and it was very unclear at different times that it was going to work.”

DeepMind employs six-month planning cycles, the first half of which tended to result in “flat performance”, says Jumper. His team would test a range of ideas, but none would take off. “Some despair would set in, every single time, but then something would start to hit and we had all sorts of innovations and you’d see it go up and then stagnate again for three months after that.” He describes the mood at that time as “exciting, punctuated equilibrium”.

When AlphaFold’s progress graph started to tick upwards last October, the team became nervous at first. “We double-, triple-checked that we hadn’t messed up. At some points we started to feel the problem was too easy and we were obsessively checking every single thing we could think of to figure out, ‘how do we make this go away so we can get back to the hard problem that we thought we had’.”

As the team looked at the predictions the algorithm made, they became worried. They appeared to be too accurate to be real predictions: “It couldn’t possibly know that! There’s no way it can predict those atoms just there!”

In April, the team suffered what appeared to be a major setback. Jumper and his colleagues had published a paper on 5 March showing the structures of proteins associated with Sars-Cov-2. After months of uncertainty surrounding the disconcerting ease with which AlphaFold was performing, the team was nervous about putting the research into the public domain.

Less than a month later, their worst fears were confirmed. A computational biologist, whom Jumper declined to name, contacted the team about its model. “They said, ‘That’s a really interesting model, it’s probably the best I’ve seen for this protein, but it’s wrong in these three ways.’ There was this paper from 2006 and it said there should be four copies of the protein, not two, and your model can’t really make a four-copy version. And this thing is in the wrong spot.”

Having spent months refining the programme, Jumper said it “was a low point” for the team. The computational biologist was telling the researchers precisely what they didn’t want to hear: that their models were disproven by classic, experiment-led molecular biology.

“These were good arguments, it was good science, and we tried to make the model do the right thing,” says Jumper, “But we couldn’t.” His team spent a frantic month spent trying to refine AlphaFold, but failed to elicit the kind of results the computational biologist suggested it should have delivered. Either AlphaFold was deeply flawed, or there was something wrong with the study it was being compared against.

“It turns out the previous experimental work was correct, but researchers had misinterpreted what it meant. In fact, almost all the conclusions we had made about the structure were right. We started saying, ‘OK, we put this out on the internet before the structure was available to anyone. This must really be right. There’s no way we could’ve made a mistake here.’

“That was the moment when it got very serious, and we started to really understand that both we had a good system and a differentially better system than everyone we know of that had put out predictions for this protein.”

In May, Jumper’s team published a blog further detailing their findings. But it wasn’t until the results of the latest Casp competition were revealed on Monday that the wider world became aware of the significance of DeepMind’s breakthrough.

Professor Venki Ramakrishnan, a Nobel laureate and the president of the Royal Society, described it as a “stunning advance”. “It has occurred decades before many people in the field would have predicted […] It will fundamentally change biological research.”

***

When Google spent £400m on DeepMind in 2014, the tech sector was divided; some observers thought it had paid too much, others that it had paid too little. Many more were concerned about the geopolitical significance of Europe’s most promising AI lab being snapped up by an American tech giant. But DeepMind’s nine-figure price-tag pales in comparison to the amount the company has cost its parent since then. It lost $154m in 2016, $341m in 2017 and $572m in 2018, the latest year for which accounts are available. While its revenues have also risen, the depth of DeepMind’s losses has raised questions about the company’s future.

The AlphaFold project may keep those fears at bay for some time. Jumper was reticent about how DeepMind could seek to make money from AlphaFold, but a spokesperson told the New Statesman: “We remain committed to delivering on our mission of bringing the benefits of AI to the world. We look forward to working with the scientific community to understand the utility of the predictions that our system generates, and exploring where it makes sense to work with others to deliver potential applications. There is huge potential, but we are still thinking through these questions.”

DeepMind has already started to work more closely with other parts of Alphabet, including Google Health, to monetise previous research. Such partnerships have inevitably raised privacy concerns, given the sensitive nature of health data and Google’s status as the world’s largest advertising company.

But at a time when Google faces questions about its size and power on both sides of the Atlantic, a breakthrough of this kind – which Jumper compares to the invention of the microscope – must also be handled carefully.

DeepMind, which has made previous iterations of its code open-source (free to use), has committed to publishing a paper on the AlphaFold research. But perhaps the most likely avenue to opening AlphaFold up to the world would be through Google’s cloud computing division, an area of the company which has so far been outpaced by Amazon and Microsoft. While university researchers or those working on neglected diseases could be given free access, Google might choose to charge pharmaceutical firms that would use it to accelerate drug discovery.

An alliance of this kind between Big Tech and Big Pharma would attract scrutiny, but the benefits could be considerable for Google’s new corporate customers and the patients who depend on their medical discoveries. With any luck, it would generate enough revenue to protect DeepMind too.