DeepMind AI needs mere 4 hours of self-training to become a chess overlord

Enlarge / We'd like to imagine AlphaZero playing its chess within a 1980s' style BBS system. (credit: Chris Wilkinson)

We last heard from DeepMind's dominant gaming AI in October. As opposed to earlier sessions of AlphaGo besting the world's best Go players after the DeepMind team trained it on observations of said humans, the company's Go-playing AI (version AlphaGo Zero) started beating pros after three days of playing against itself with no prior knowledge of the game.

On the sentience front, this still qualified as a ways off. To achieve self-training success, the AI had to be limited to a problem in which clear rules limited its actions and clear rules determined the outcome of a game. (Not every problem is so neatly defined, and fortunately, the outcomes of an AI uprising probably fall into the "poorly defined" category.)

This week, a new paper (PDF, not yet peer reviewed) details how quickly DeepMind's AI has improved at its self-training in such scenarios. Evolved now to AlphaZero, this latest iteration started from scratch and bested the program that beat the human Go champions after just eight hours of self-training. And when AlphaZero instead decided to teach itself chess, the AI defeated the current world-champion chess program, Stockfish, after a mere four hours of self-training. (For fun, AlphaZero also took two hours to learn shogi—"a Japanese version of chess that’s played on a bigger board," according to The Verge—and then defeated one of the best bots around.) 

Read 6 remaining paragraphs | Comments