Share it with your friends Like

Thanks! Share it with your friends!


♚ Play turn style chess at
1 minute per move, 100 game match, match score: 28 wins, 72 draws, AI Landmark game, Stockfish crushed, Bishop pair worth more than knight and 4 pawns

Research paper: “Mastering Chess and Shogi by Self-Play with a
General Reinforcement Learning Algorithm” :

David Silver,1∗ Thomas Hubert,1∗
Julian Schrittwieser,1∗
Ioannis Antonoglou,1 Matthew Lai,1 Arthur Guez,1 Marc Lanctot,1
Laurent Sifre,1 Dharshan Kumaran,1 Thore Graepel,1
Timothy Lillicrap,1 Karen Simonyan,1 Demis Hassabis1

Click to access 1712.01815.pdf

The game of chess is the most widely-studied domain in the history of artificial intelligence.
The strongest programs are based on a combination of sophisticated search techniques,
domain-specific adaptations, and handcrafted evaluation functions that have been
refined by human experts over several decades. In contrast, the AlphaGo Zero program
recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement
learning from games of self-play. In this paper, we generalise this approach into
a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in
many challenging domains. Starting from random play, and given no domain knowledge
except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in
the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a
world-champion program in each case ….

Read more at:

What is reinforcement learning?

“Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, reinforcement learning may be used to explain how equilibrium may arise under bounded rationality.

In machine learning, the environment is typically formulated as a Markov decision process (MDP), as many reinforcement learning algorithms for this context utilize dynamic programming techniques.[1] The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).[2] The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.”

What is this company called Deepmind ?

DeepMind Technologies Limited is a British artificial intelligence company founded in September 2010.

Acquired by Google in 2014, the company has created a neural network that learns how to play video games in a fashion similar to that of humans,[4] as well as a Neural Turing machine,[5] or a neural network that may be able to access an external memory like a conventional Turing machine, resulting in a computer that mimics the short-term memory of the human brain.[6][7]

The company made headlines in 2016 in nature after its AlphaGo program beat a human professional Go player for the first time in October 2015.[8] and again when AlphaGo beat Lee Sedol the world champion in a five-game tournament, which was the subject of a documentary film.

►Kingscrusher chess resources:
►Kingscrusher’s “Crushing the King” video course with GM Igor Smirnov:
►FREE online turn-style chess at
►Kingscrusher resources:
►Play and follow broadcasts at Chess24:
♞ Challenge KC and others for turn style chess at



Adil Ghaznavi says:

1 minute per game? Wasn't it one minute per move?

Son of the Allfather says:

It was engineers and mathematics, the human mind that generated the proper algorithms to force this program to teach itself. It doesn't have a Will of it's own, it's following a set of instructions, nothing more. Credit should be given to the geniuses who wrote the incredibly complex programming.

yoxter 3423 says:

According to alphazero calculate 80.000 positions per second, while stockfish calculate 70 millions of positions per second. Wow, amazing, this is one of most glorius moments of AI

Nallanyesmar says:

And to think I'm struggling with Stockfish on level 7

Etienne 777 says:

If stockfish would stop making desperate pawn runs it would fare much better. I see this time and again. It looses because of one illogical stupid move, maybe the creator placed into its DNA? A bit like satan with his corrupt fallen nature that will destroy himself in the end, or the illuminati.

For Goodness Ache says:

How hard is it to label who's who??

Paul L. says:

AlphaZero ran on an army of PCs. It's like expecting a program on a 1980s calculator to win one running on a high end PC.

Stephen Hughes says:

I just found out that Stockfish got crushed, here's my reaction: FUCK MY LIFE

Mel CHAMAND says:

I believe that A0 uses Morphy-esque moves because he thinks way more deeply than humans, in fact he is not influenced by our way of thinking at all and thus can understand the game as a whole instead of thinking about "this move leading to this one in 10 turns" or "this piece being worth 5 points"… Stockfish loses all these encounters because he is inevitably mistaken in the value of each piece just like we would be against A0. Indeed, in order to have the exact value of each piece it is needed to find the optimum strategy from a given configuration. For Stockfish this value is hardcoded whereas for A0, this value is approximated throughout a very high number of iterations.
If you analyse most of A0 vs Stockfish encounters, A0 almost always wins due to a really surprising position advantage. I think that for him what might happen in 10 or 20 turns if one piece is taken becomes obvious. Instead he is way more focused on getting a better position in 20 or 40 turns than we are. When less and less mistakes are made, the slightest shift can win the game.

And to get a better position you need to take the initiative as soon as possible to get that better position in the future. That is why I think A0 and all of the future machine learning based algorithms will keep playing seemingly losing moves in order to eventually get a better position several turns later. So yeah Romantic chess will revive, but only for AIs.

Well actually let me take it back, I really hope that there will be more obsessed geniuses like Bobby Fischer who will be able to develop this kind of playstyle against other humans. However, in order to be realistic they would have to work extremely hard on a few openings, thus being able to consistently win games with the white pieces.

James Micheal says:

I think the reason why Alpha Zero beats Stockfish 8. Is One they took away its Book Openings .Second No hash,Stockfish 8 was given 1 GB hash table size When 32 or 64 or More .The hardware could make a great difference . You need to have at least 1024 of hash in order for Stockfish 8 to play very strong . .No endgame table bases. The pounder is off. You need to run Stockfish 8 on at least 64 cores . Mean while Alpha Zero is running on a super
computers. Stockfish 8 did not . This game is not a fair game at all .

Dan Kelly says:

I hope AI takes away all human power because humans are scum.

Dan Kelly says:

It wasn't surprising to me. I was surprised it didn't happen earlier because the potential of neural nets was known long before the Deep Mind Alpha series. It's just that in the past we didn't have enough computer power to do the neural net training phase in a reasonable amount of time.

Dan Kelly says:

DH a great chess enthusiast? He was a prodigy.

The Robot Guy says:

wtf, i have been watching youtube for 10 years, i am glad i found your channel. You are just a guy which is really awesome in terms of passion and discovery. What amazing commentary. THIS IS WHAT I HAVE BEEN LOOKING FOR

Pintkonan says:

lets hope john connor has already been born 😮

walkabout16 says:

Audio sync off

Chris Collins says:

AI can easily win at chess. 0 theory needed. Everyone rethink your life.

Peter Petrov says:

Black looked like he is in trouble for awhile. Was Alpha evaluation always in black's favor?

c j says:

Great, thanks KC, I was hoping you'd lend your insightful and passionate commentary to these games. Excited to watch them. You are hands down the best chess commentator.

Lalan Prasad Saha says:

Wonderful play

JayAr02 says:

3:08 why did stockfish give away his knight. It doesn't make sense at all, or what do I not see here?

timmahtown says:

Alpha Zero is Black in case you missed it

James Brown says:

1 minute per move, NOT per game.

Anime Weeb says:

These 2 engines should have a kid named Stockfish Zero

Robert Luben says:

Rook F7 looks like a Nakamura move.

joexpoe says:

It wasn't one-minute games, it was one minute per move.

Joe Jakubiec says:

This match is not legit, every single engine says Rc1 on move 35 and that 35. Nc4 is a big mistake

Write a comment


DARPA SUPERHIT 2021 Play Now!Close


(StoneBridge Mix)

Play Now!