AlphaGo Zero: Learning from scratch


Earlier versions of AlphaGo used a "policy network" to select the next move to play and a "value network" to predict the victor of the game from each position. AlphaGo bested South Korean Go master Lee Sedol in part by learning from a vast catalog of example moves by humans.

AlphaGo became the first program to defeat a world champion in the game of Go.

AlphaGo Zero was given no supervised training based on any expert moves. Three days and 4.9 million games later, Zero defeated AlphaGo Lee, an early version of the AI that destroyed 18-time world master Lee Sedol in 2015. It was one of those versions that went on to beat top player Lee Sedol a year ago, grabbing worldwide headlines.

Remarkably, during this self-teaching process AlphaGo Zero discovered numerous tricks and techniques that human Go players have developed over the past several thousand years.

The open-ended nature of the game has made Go a "grand challenge for artificial intelligence", the researchers say.

The team behind the AI said this breakthrough means AlphaGo is no longer "constrained by the limits of human knowledge". The AI learned without supervision-it simply played against itself, and soon was able to anticipate its own moves and how they would affect a game's outcome. But in the middle, the AI's moves didn't seem to follow anything a human would recognize as a strategy; instead, it would consistently find ways to edge ahead of any opponent, even if it lost ground on some moves.

The ancient game of Go works using two players, a board and pieces called stones. To do so, they used a process called reinforcement learning. Both Zeroes would start off with knowledge of the rules of Go, but they would only be capable of playing random moves. There are problems that AlphaGo Zero can not solve, such as games with hidden states or imperfect information, such as Starcraft or Dota, and it's unlikely that self-play will help there. A +1 is given if it wins and a -1 if it loses. As Zero was continuously trained, the system began learning advanced concepts in the game of Go on its own and picking out certain advantageous positions and sequences.

So, the black player could come up with four chains of next moves, and predict the third chain will be the most successful. After beating Jie earlier this year, DeepMind announced AlphaGo was retiring from future competitions. AlphaGo Zero, however, uses a single neural network. During the development of Zero, Hassabis says the system was trained on hardware that cost the company as much as $35 million.

DeepMind said that it's not releasing the code as it might for other projects.

AI programs like AlphaGo Zero that can gain mastery of various tasks without human input may be able to solve problems where human expertise falls short, says Satinder Singh, a computer scientist at the University of MI in Ann Arbor.

First, it used a so-called search tree to determine how many times a move would lead to a win in a set of quickly simulated games-a process called rollout. The software is a distillation of previous systems DeepMind has built: It's more powerful, but simpler and doesn't require the computational power of a Google server to run.

While the results were impressive enough to consistently beat top human players, they required expert input during the training.

So the people at DeepMind decided to make a Go-playing AI that could teach itself how to play. We can also solve problems where the solution is nebulous, there are no "winners" and the rules to guide us don't exist.

Computer scientist Professor Satinder Singh, of Michigan University, reviewed the findings for the journal and said teaching a computer to learn from scracth is a "major achievement". The system, for instance, couldn't be used to translate languages. "In 10 years, I hope that these kinds of algorithms will be routinely advancing the frontiers of scientific research", says Hassabis.

There are far more potential moves than in chess and Go is said to be the most complex board game ever devised, making it an ideal testing ground for artificial intelligence.

"Drug discovery, proteins, quantum chemistry, material design-material design, think about it, maybe there is a room-temperature superconductor out and about there", Hassabis says, alluding to a hypothetical metal that would be able to perfectly conduct electricity. "Sometimes it's actually chosen to go beyond that and discovered something that the humans hadn't even discovered in this time period". Applying this approach to real-world scenarios where there's a level of unpredictability is much harder.