OpenAI Five
OpenAI Five
- OpenAI Five** is a computer program by OpenAI that plays the five-on-five video game *Dota 2*. Its first public appearance occurred in 2017, where it was demonstrated in a live one-on-one game against the professional player Dendi, who lost to it. The following year, the system had advanced to the point of performing as a full team of five, and began playing against and showing the capability to defeat professional teams.
By choosing a game as complex as Dota 2 to study machine learning, OpenAI thought they could more accurately capture the unpredictability and continuity seen in the real world, thus constructing more general problem-solving systems. The algorithms and code used by OpenAI Five were eventually borrowed by another neural network in development by the company, one which controlled a physical robotic hand. OpenAI Five has been compared to other similar cases of artificial intelligence (AI) playing against and defeating humans, such as AlphaStar in the video game StarCraft II, AlphaGo in the board game Go, Deep Blue in chess, and Watson on the television game show Jeopardy!.
History Development on the algorithms used for the bots began in November 2016. OpenAI decided to use Dota 2, a competitive five-on-five video game, as a base due to it being popular on the live streaming platform Twitch, having native support for Linux, and had an application programming interface (API) available. Before becoming a team of five, the first public demonstration occurred at The International 2017 in August, the annual premiere championship tournament for the game, where Dendi, a Ukrainian professional player, lost against an OpenAI bot in a live one-on-one matchup. After the match, CTO Greg Brockman explained that the bot had learned by playing against itself for two weeks of real time, and that the learning software was a step in the direction of creating software that can handle complex tasks "like being a surgeon". OpenAI used a methodology called reinforcement learning, as the bots learn over time by playing against itself hundreds of times a day for months, in which they are rewarded for actions such as killing an enemy and destroying towers. At The International 2018, OpenAI Five played in two games against professional teams, one against the Brazilian-based paiN Gaming and the other against an all-star team of former Chinese players. Although the bots lost both matches, OpenAI still considered it a successful venture, stating that playing against some of the best players in Dota 2 allowed them to analyze and adjust their algorithms for future games. The bots' final public demonstration occurred in April 2019, where they won a best-of-three series against The International 2018 champions OG at a live event in San Francisco. A four-day online event to play against the bots, open to the public, occurred the same month. There, the bots played in 42,729 public games, winning 99.4% of those games.
Architecture Each OpenAI Five bot is a neural network containing a single layer with a 4096-unit LSTM that observes the current game state extracted from the Dota developer's API. The neural network conducts actions via numerous possible action heads (no human data involved), and every head has meaning. For instance, the number of ticks to delay an action, what action to select – the X or Y coordinate of this action in a grid around the unit. In addition, action heads are computed independently. The AI system observes the world as a list of 20,000 numbers and takes an action by conducting a list of eight enumeration values. Also, it selects different actions and targets to understand how to encode every action and observe the world. using Proximal Policy Optimization, a policy gradient method.
Comparisons with other game AI systems Prior to OpenAI Five, other AI versus human experiments and systems have been successfully used before, such as Jeopardy! with Watson, chess with Deep Blue, and Go with AlphaGo. In comparison with other games that have used AI systems to play against human players, Dota 2 differs as explained below:
- Long run view**: The bots run at 30 frames per second for an average match time of 45 minutes, which results in 80,000 ticks per game. OpenAI Five observes every fourth frame, generating 20,000 moves. By comparison, chess usually ends before 40 moves, while Go ends before 150 moves.
- Partially observed state of the game**: Players and their allies can only see the map directly around them. The rest of it is covered in a fog of war which hides enemies units and their movements. Thus, playing *Dota 2* requires making inferences based on this incomplete data, as well as predicting what their opponent could be doing at the same time. By comparison, Chess and Go are "full-information games", as they do not hide elements from the opposing player. Chess champion Garry Kasparov, who lost against the Deep Blue AI in 1997, stated that despite their losing performance at The International 2018, the bots would eventually "get there, and sooner than expected".
In a conversation with MIT Technology Review, AI experts also considered OpenAI Five system as a significant achievement, as they noted that Dota 2 was an "extremely complicated game", so even beating non-professional players was impressive. PC Gamer wrote that their wins against professional players was a significant event in machine learning. In contrast, Motherboard wrote that the victory was "basically cheating" due to the simplified hero pools on both sides, as well as the fact that bots were given direct access to the API, as opposed to using computer vision to interpret pixels on the screen. The Verge wrote that the bots were evidence that the company's approach to reinforcement learning and its general philosophy about AI was "yielding milestones".
It was OpenAI's hope that the technology could have applications outside of the digital realm. In 2018, they were able to reuse the same reinforcement learning algorithms and training code from OpenAI Five for Dactyl, a human-like robot hand with a neural network built to manipulate physical objects. In 2019, Dactyl solved the Rubik's Cube.