Bard, Foerster, Chandar, et al.
presented by Albert Orozco Camacho
In the context of reinforcement learning (RL)...
Nevertheless, such games just offer testing:
Yet, games, in general, need many more abilities
GOAL: Play cards to form five consecutively ordered stacks
Players take turns doing one of three actions:
Giving a hint
The active player tells any other player a clue about the content of their hand. Hints are limited by information tokens (8 total)
The active player can discard a card from their hand, whenever there are < 8 information tokens
Such player then, has to draw a new card from the deck and an information token is recovered
Pick a card and play it
Game Over
Two Learning Challenges:
Both with limited and unlimited sampling regimes
Self-Play Learning: Find a joint policy that maximizes a score through repeatedly playing the game.
Ad-hoc Teams
- Mixture of agents trained each one with a particular algorithm and/or human-like
- Focus is on measuring an agent's ability to play with a wide range of teammates
Actor-Critic-Hanabi-Agent (ACHA)
- Asynchronous implementation of an actor-critic algorithm
- Policy is represented by a DNN
- Learns a value function as a baseline for variance reduction
- Learned gradients are controlled by a centralized server, which holds the DNN parameters
- Has shown good performance on tasks such as Arcade Learning Environment, TORCS driving simulator, and 3D first-person environments.
Rainbow-Agent
- SOTA agent architecture for deep RL
- Combines innovations made to Deep-Q Networks into a sample eficient and high-rewarded algorithm
BAD Agent
- Bayesian Action Decoder
- SOTA for the two-player unlimited regime
- Bayesian belief update conditioned on current policy of the acting agent
(that attempt to immitate human reasoning)
SmartBot
- Tracks the publicly known information about each player's cards
- Prevents other players to play/discard cards that they don't know are safe or not.
HatBot and WTFWThat
- HatBot uses a predefined protocol to determine a recommended action forall other players
- Every agent can infer other player's recommended actions according to HatBot's convention
- WTFWThat is a variant of the HatBot strategy that can play with 2 through 5 players
FireFlower
- Implements a set of human-style conventions
- Searches over all possible actions and choses the one that maximizes the expected value of an evaluation function
The cooperative gameplay and imperfect infomation of Hanabi makes it a complling research challenge for
- Multiagent RL
- Game Theory
The authors evaluate SOTA deep RL algorithms showing that
- they are largely insufficient to surpass hand-coded bots;
- in ad-hoc settings, agents fail to collaborate at all
The authors believe that theory of mind plays an important role
- to learn what humans are really thinking
- to adapt to unknown teammates
- to recognize the intention of other players