(a) Parabellum
(b) Wrestling
(c) Pong
(d) Cat-and-mouse
(e) Pursuers-and-evaders
Examples of behaviors found with our method, GAME, across five environments. GAME was able to find a diversity of solutions in five adversarial domains: (a) Parabellum, a battle game with many units, (b) Wrestling, a soft-robot morphology competition, (c) Pong, a symmetrical and adversarial game, (d) Cat-and-mouse, an asymetrical game where a fast cat wants to catch a agile mouse, and (e) Pursuers-and-evaders, a multi-agent and asymetrical game in an enclosed arena.
April 23, 2026

Adversarial Coevolutionary Illumination
with Generational Adversarial MAP-Elites

Timothée AnneIT University of Copenhagen
Noah SyrkisIT University of Copenhagen
Meriem Elhosniarmasuisse Science+Technology
Florian Turatiarmasuisse Science+Technology
Alexandre Manaiarmasuisse Science+Technology
Franck Legendrearmasuisse Science+Technology
Alain Jaquierarmasuisse Science+Technology
Sebastian RisiIT University of Copenhagen and Sakana AI
GAME is a coevolutionary Quality-Diversity algorithm that finds a set of diverse and high-quality solutions for adversarial problems by alternating the execution of MTMB-ME [1] on a set of tasks (i.e., fixed solutions from the opposing side) to encourage arms race dynamics. For example, in this illustration, the blue letters could represent strategies a mouse would use to avoid a cat, and the red letters represent strategies a cat would use to catch a mouse.

Introduction

Evolving a set of diverse, high-quality solutions that optimize a fitness function while covering a behavior space is known as Quality-Diversity (QD) [2] (or Illumination). It has been successfully applied in multiple domains: robotics [3], video games [4], chemical synthesis [5], and aeronautics [6].

QD can be used to illuminate adversarial problems, for which it can be critical to identify all possible attack strategies, for example, to evaluate a system's current safety and defend accordingly. Examples of applications include: video games for automatic balancing of competitive games [7, 8], generalization evaluation of machine learning models [9, 10], and red teaming [11], i.e., finding adversarial prompts that generate harmful content [12, 13, 14, 15]. Those works illuminate only one side of the adversarial problem while fixing the other, thereby providing only partial illumination against the set of opposing problems picked by the experimenter.

Evolving both sides follows the artificial life goal of creating an open-ended evolutionary process through adversarial coevolution [16, 17]. For example, the POET algorithm [18, 19] coevolves bipedal walkers' policies and environments to create an automatic curriculum toward more robust policies; [20] coevolve Generative Adversarial Networks (GANs) to improve their performance using QD; and [21] coevolve Python scripts in a pursuer and evader game. A limitation of those works is that they are domain-specific.

More recently, Digital Red Queen (DRQ) [22] is a self-play algorithm that leverages LLMs to evolve assembly scripts that compete for survival in a memory-bound virtual machine. DRQ sequentially evolves a solution that competes against all previous elite solutions, leveraging MAP-Elites [23] at each generation to preserve diversity. DRQ applies only to adversarial environments where solutions compete against each other to survive. In contrast, we propose a coevolutionary QD method that applies to any two-opponent adversarial problem.

We present Generational Adversarial MAP-Elites (GAME), a new coevolutionary QD algorithm that illuminates both sides of an adversarial problem by alternating the evolution of solutions on one side that maximize the adversarial fitness against fixed opponents from the other side (see figure above). A second contribution is the use of a Vision Embedding Model (VEM) (we use CLIP [24]) to obtain a domain-agnostic behavior descriptor from videos, without requiring a handcrafted behavior descriptor. To handle the high dimensionality of the embedding space, GAME uses an unstructured archive [25].

This blog post includes two publications: the original GAME paper [26] (the extended version of [27]) and a following study to improve the adversarial quality and diversity of the found solutions [28].

The challenges of adversarial quality diversity

figure

The main challenge of adversarial quality diversity is that, when evaluating two opposing solutions, it is not trivial to determine whether a solution's high fitness is due to its high quality or to the opposing solution's poor quality. As in most interesting domains, it is often easier to sample a poor solution; a traditional QD algorithm would select solutions that performed well against poor ones. A second challenge is that the behavior of one solution also depends on the opposing solution; for example, a defensive strategy in a battle game will only react if the opposing units move close enough.

The current solution to solve these challenges is to freeze one side of the adversarial problem. This is not satisfying as (1) the solutions will overfit the fixed set of opponents, (2) the experimenter needs to know good solutions already, and (3) this prevents the arms-race dynamics often present in adversarial problems.

Our method: Generational Adversarial MAP-Elites (GAME)

Our solution is to coevolve both sides in a sequence of generations. At each generation, GAME selects a set of solutions for one side to serve as fixed opponents, also called tasks, and leverages a multi-task QD algorithm, MTBM-ME [1], to evolve for each opponent a set of diverse solutions that perform well against it. GAME then switches sides and selects solutions from the previous generation to serve as fixed opponents at each generation.

More formally, GAME:

Experiments

Battle game: Parabellum

Archive projection: 2D PCA of the VEM behavior space after running GAME in Parabellum for 2M evaluations over 20 generations.

Parabellum [29] is a battle game research environment implemented in JAX [30] for fast parallelization. Two sides, Blue and Red, aim to eliminate the other. At each time step, each unit can either move, stand, attack an enemy in reach, or heal an ally in reach, given its local observation (a unit can only see other units within its sight range). There are five unit types with different health, speed, attack range, and attack damage, each shown in a different color. In our experiment, each side comprises 32 units of each type (i.e., 160 units in total).

Parabellum uses behavior trees (BTs) [31] to control each unit, where all the units of one side share the same BT. Their tree structure allows the use of genetic variation operators and also makes them intuitive to design by hand and interpretable. We define a set of atomic actions and conditions in JAX that the BT combines to create a policy, which we convert into an array representation for easy vectorized evaluation, at the cost of limiting the BT's size.

In Parabellum, we compare GAME to ablations and a variant with a multi-objective fitness that secondarily minimizes the size of the BT. The main takeaways are:

Parabellum examples: 16 behaviors selected by applying clustering to a PCA projection of the VEM behavior space from one execution of GAME on Parabellum.

Soft-robot: Wrestling

Archive projection: 2D PCA of the VEM behavior space after running GAME in Wrestling for 200k evaluations over 10 generations.

Wrestling is a custom EvoGym [35] environment in which two 2D soft robots fight to be the closest to the center of the arena. While the coevolution of morphology and its control is an interesting and challenging topic [36, 35, 37, 38], it is beyond the scope of this work to explore the intersection of adversarial coevolution and body–brain coevolution. We instead focus on the adversarial coevolution of morphology in an adversarial environment, where the morphology passively provides actuation. We follow [39, 40] and define passive and active voxels that change volume according to a sine-wave pattern. GAME evolves 5x5 robots with the constraints that all non-empty voxels are connected and that there is at least one actuated voxel.

We define the fitness function as the ratio of timesteps where the robot was closest to the center. Interestingly, this zero-sum fitness makes the application of classic QD methods even more difficult, as a nearly static morphology can achieve perfect fitness if it is only slightly closer to the center, i.e., fitness is purely relative. As a behavior space, we also use the same VEM, i.e., CLIP [24], on 5 subframes of the confrontation video.

We evaluate GAME against a one-sided baseline (i.e., we use MTMB-ME against a fixed set of random opponents for the same total number of evaluations), with the following takeaways:

Wrestling examples: 18 behaviors selected by applying clustering to a PCA projection of the VEM behavior space from one execution of GAME on Wrestling.
figure
ELO Score performance of the morphological species through the generations for one run of GAME.

Additionally, we took all elites from a single execution of GAME across generations, clustered them by morphology, and computed their relative performance using an ELO score in a round-robin tournament. This tells us that the search space is not open-ended, in the sense that GAME discovers all the morphological species in the first generation. Nonetheless, at each generation, GAME improves the performance of each species. Future work should focus on integrating body–brain coevolution [36, 35, 37, 38] into GAME to enable a truly open-ended search space for artificial creatures.

Deck building: Hearthstopper

To further evaluate GAME's generality, this time without VEM, we apply it to a deck-building and battle card game. In addition, we compare its ability to find diverse and high-quality decks against a one-sided illumination QD ablation, as in [7]. We use a Python simulator of the Hearthstone digital collectible and battle card game [41], Hearthbreaker [42]. Hearthstone is a two-phase card game. In the first phase, deck building, the player chooses a class and creates a deck of 30 cards. In the second phase, the deck battle, the player battles against an opponent. The goal is to set the opponent's life from 30 to 0. Similar to [7], we consider only the deck-building phase and use Hearthbreaker's Trade Agent heuristic to play the deck. This heuristic is a greedy policy that maximizes the opponent's minion loss and then maximizes damage to their health.

We compare with a one-sided illumination (i.e., MAP-Elites [23]) that evolves a diversity of decks against only the starting deck of the opposing side. In contrast, GAME coevolves a diversity of solutions for both sides. Comparing against the starting decks: GAME's coverage is significantly worse than the one-sided method. The reason is that GAME splits the archive across multiple tasks, leaving few cells for diversity, whereas the one-sided method can allocate all its cells to diversity. This becomes apparent in this case study because we use a 2D behavior space. It is thus much easier to cover the entire space of reachable behaviors. This highlights a limitation of GAME: the independence of diversity across tasks, which leads to the storage of elites with similar behaviors across tasks, thereby reducing the archive's overall diversity. Still, even though GAME's elites are not evolved against the starting decks, they show similar performance against them.

Comparing the methods against each other: GAME finds significantly better decks for 7 out of 10 comparisons and non-significantly better decks for the remaining three. A simple reason is that the one-sided method's decks are only evolved to compete against the starting deck. In contrast, GAME coevolves a diversity of decks that prevents overfitting to a single one.

Task selection mechanism

figure

The task selection mechanism drives the coevolutionary process and should select tasks that present a greater diversity of challenges at each generation to broaden the illumination. The original GAME [26] uses a behavioral criterion to select tasks, which is not satisfying because it disregards the adversarial aspect.

In our second work [28], we propose two new task selection methods, Ranking and Pareto, that leverage a tournament between the current elites and tasks, and compare them against the original method, Behavior, and a random baseline, Random.

Pong

Cat-and-mouse

Pursuers-and-evaders

We compare the variants using metrics of adversarial quality and diversity across three adversarial problems: Pong, Cat-and-mouse, and Pursuers-and-evaders. We conclude that both tournament-informed task selection methods outperform the original method and the random baseline, and that Ranking is slightly but significantly better than Pareto. This opens the way for more performant adversarial QD, while future work should focus on proposing more sample-efficient methods that estimate the costly tournament ranking.

What Now?

Adversarial QD, what for?

Adversarial QD has multiple applications:

For each of those end-goals, the practitioner needs a different metric, which we also discuss in our second paper [28].

Open-ended coevolution

The main limitation of these two works is the non-open-endedness of the adversarial domains. We're enthusiastic about applying GAME to more open-ended problems to understand the emergence of artificial open-endedness better.

Playing with GAME

If, like us, you want to play with GAME for your own adversarial problems, we recommend looking at the examples in our second paper Repo. We are also open to collaboration and will be at Gecco '26 in San José, Costa Rica, July 13 - 17, 2026, to present our second paper.

Appendix

Acknowledgements

Funded by the armasuisse S+T project F00-007.

BibTeX citations

Main paper for GAME: PDF, Repo, and citation:

@ARTICLE{11494045,
  author={Anne, Timothée and Syrkis, Noah and Elhosni, Meriem and Turati, Florian and Legendre, Franck and Jaquier, Alain and Risi, Sebastian},
  journal={IEEE Transactions on Evolutionary Computation}, 
  title={Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites}, 
  year={2026},
  volume={},
  number={},
  pages={1-1},
  keywords={Video games; Quality-Diversity; Adversarial Coevolution},
  doi={10.1109/TEVC.2026.3686956}
}

Second citation for the study of the task selection mechanism and the evaluations on Pong, Cat-and-mouse, and Pursuers-and-evaders: PDF, Repo, and citation:

@misc{anne2026tournamentinformedadversarialquality,
      title={Tournament Informed Adversarial Quality Diversity}, 
      author={Timothée Anne and Noah Syrkis and Meriem Elhosni and Florian Turati and Alexandre Manai and Franck Legendre and Alain Jaquier and Sebastian Risi},
      year={2026},
      eprint={2601.19562},
      archivePrefix={arXiv},
      primaryClass={cs.NE},
      url={https://arxiv.org/abs/2601.19562}, 
      note={Accepted at GECCO '26, San José, Costa Rica}
}

References

  1. T. Anne and J. Mouret, Multi-task multi-behavior map-elites, in Proceedings of the Companion Conference on Genetic and Evolutionary Computation, pp. 111–114, 2023.
  2. J. K. Pugh, L. B. Soros, and K. O. Stanley, Quality diversity: A new frontier for evolutionary computation, Frontiers in Robotics and AI, vol. 3, pp. 40, 2016.
  3. A. Cully, J. Clune, D. Tarapore, and J. Mouret, Robots that can adapt like animals, Nature, vol. 521, no. 7553, pp. 503–507, 2015.
  4. D. Gravina, A. Khalifa, A. Liapis, J. Togelius, and G. N. Yannakakis, Procedural content generation through quality diversity, in 2019 IEEE Conference on Games (CoG), pp. 1–8, 2019.
  5. Y. Jiang, D. Salley, A. Sharma, G. Keenan, M. Mullin, and L. Cronin, An artificial intelligence enabled chemical synthesis robot for exploration and optimization of nanomaterials, Science advances, vol. 8, no. 40, pp. eabo2626, 2022.
  6. L. Brevault and M. Balesdent, Bayesian Quality-Diversity approaches for constrained optimization problems with mixed continuous, discrete and categorical variables, Engineering Applications of Artificial Intelligence, vol. 133, pp. 108118, 2024.
  7. M. C. Fontaine, S. Lee, L. B. Soros, F. d. M. Silva, J. Togelius, and A. K. Hoover, Mapping hearthstone deck spaces through map-elites with sliding boundaries, in Proceedings of The Genetic and Evolutionary Computation Conference, pp. 161–169, 2019.
  8. M. C. Fontaine, J. Togelius, S. Nikolaidis, and A. K. Hoover, Covariance matrix adaptation for the rapid illumination of behavior space, in Proceedings of the 2020 genetic and evolutionary computation conference, pp. 94–102, 2020.
  9. K. Steckel and J. Schrum, Illuminating the space of beatable lode runner levels produced by various generative adversarial networks, in Proceedings of the genetic and evolutionary computation conference companion, pp. 111–112, 2021.
  10. M. Samvelyan, D. Paglieri, M. Jiang, J. Parker-Holder, and T. Rocktaschel, Multi-agent diagnostics for robustness via illuminated diversity, arXiv preprint arXiv:2401.13460, 2024.
  11. D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, and et al., Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned, arXiv preprint arXiv:2209.07858, 2022.
  12. M. Samvelyan, S. Raparthy, A. Lupu, E. Hambro, A. H. Markosyan, M. Bhatt, Y. Mao, M. Jiang, J. Parker-Holder, J. Foerster, T. Rocktaschel, and R. Raileanu, Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts, ArXiv, vol. abs/2402.16822, 2024.
  13. R. Wang, K. Xue, Z. Qin, Z. Li, S. Tang, H. Li, S. Liu, and C. Qian, Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models, arXiv preprint arXiv:2506.07121, 2025.
  14. Q. Dang, C. Ngo, and T. Hy, RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search, arXiv preprint arXiv:2504.15047, 2025.
  15. T. H. Nguyen and N. H. Luong, Diversifying Adversarial Attacks on Text-to-image Generation, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 315–318, 2025.
  16. M. A. Bedau, J. S. McCaskill, N. H. Packard, S. Rasmussen, C. Adami, D. G. Green, T. Ikegami, K. Kaneko, and T. S. Ray, Open problems in artificial life, Artificial life, vol. 6, no. 4, pp. 363–376, 2000.
  17. A. Dorin and S. Stepney, What Is Artificial Life Today, and Where Should It Go?, 2024.
  18. R. Wang, J. Lehman, J. Clune, and K. O. Stanley, Poet: open-ended coevolution of environments and their optimized solutions, in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 142–151, 2019.
  19. R. Wang, J. Lehman, A. Rawal, J. Zhi, Y. Li, J. Clune, and K. Stanley, Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions, in International conference on machine learning, pp. 9940–9951, 2020.
  20. V. Costa, N. Lourenco, J. Correia, and P. Machado, Exploring the evolution of gans through quality diversity, in Proceedings of the 2020 genetic and evolutionary computation conference, pp. 297–305, 2020.
  21. A. Dharna, C. Lu, and J. Clune, Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models, in NeurIPS 2024 Workshop on Open-World Agents, 2024.
  22. A. Kumar, R. Bahlous-Boldi, P. Sharma, P. Isola, S. Risi, Y. Tang, and D. Ha, Digital Red Queen: Adversarial Program Evolution in Core War with LLMs, arXiv preprint arXiv:2601.03335, 2026.
  23. J. Mouret and J. Clune, Illuminating search spaces by mapping elites, arXiv preprint arXiv:1504.04909, 2015.
  24. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, and et al., Learning transferable visual models from natural language supervision, in International conference on machine learning, pp. 8748–8763, 2021.
  25. V. Vassiliades, K. Chatzilygeroudis, and J. Mouret, A comparison of illumination algorithms in unbounded spaces, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 1578–1581, 2017.
  26. Anne, T., Syrkis, N., Elhosni, M., Turati, F., Legendre, F., Jaquier, A., & Risi, S. (2025). Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites. IEEE Transactions on Evolutionary Computation. DOI: 10.1109/TEVC.2026.3686956.
  27. T. Anne, N. Syrkis, M. Elhosni, F. Turati, F. Legendre, A. Jaquier, and S. Risi, Generational Adversarial MAP-Elites for Multi-Agent Game Illumination, ALIFE '25, Kyoto, Japan, 2025.
  28. Anne, T., Syrkis, N., Elhosni, M., Turati, F., Manai, A., Legendre, F., ... & Risi, S. (2026). Tournament Informed Adversarial Quality Diversity. Accepted as a full paper in the Genetic and Evolutionary Computation Conference.
  29. T. Anne, N. Syrkis, M. Elhosni, F. Turati, F. Legendre, A. Jaquier, and S. Risi, Harnessing language for coordination: A framework and benchmark for llm-driven multi-agent control, IEEE Transactions on Games, 2025.
  30. J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. Wanderman-Milne, and Q. Zhang, JAX: composable transformations of Python+NumPy programs, 2018.
  31. M. Colledanchise and P. Ogren, Behavior trees in robotics and AI: An introduction. CRC Press, 2018.
  32. M. Kimura, The neutral theory of molecular evolution, Scientific American, vol. 241, no. 5, pp. 98–129, 1979.
  33. J. Lehman and R. Miikkulainen, Enhancing divergent search through extinction events, in Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 951–958, 2015.
  34. K. O. Stanley and R. Miikkulainen, Evolving neural networks through augmenting topologies, Evolutionary computation, vol. 10, no. 2, pp. 99–127, 2002.
  35. J. Bhatia, H. Jackson, Y. Tian, J. Xu, and W. Matusik, Evolution gym: A large-scale benchmark for evolving soft robots, Advances in Neural Information Processing Systems, vol. 34, pp. 2201–2214, 2021.
  36. N. Cheney, J. Bongard, V. SunSpiral, and H. Lipson, Scalable co-optimization of morphology and control in embodied machines, Journal of The Royal Society Interface, vol. 15, no. 143, pp. 20170937, 2018.
  37. A. Mertan and N. Cheney, Modular controllers facilitate the co-optimization of morphology and control in soft robots, in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 174–183, 2023.
  38. G. Nadizar, E. Medvet, and D. G. Wilson, Enhancing adaptability in embodied agents: A multi-quality-diversity approach, IEEE Transactions on Evolutionary Computation, 2025.
  39. N. Cheney, R. MacCurdy, J. Clune, and H. Lipson, Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding, ACM SIGEVOlution, vol. 7, no. 1, pp. 11–23, 2014.
  40. S. Kriegman, N. Cheney, F. Corucci, and J. C. Bongard, A minimal developmental model can increase evolvability in soft robots, in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 131–138, 2017.
  41. B. Entertainment, Hearthstone: Heroes of Warcraft, 2014.
  42. Y. Daniel and et al., Hearthbreaker: A Hearthstone Simulator, 2014.
  43. K. Deb and H. Jain, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints, IEEE transactions on evolutionary computation, vol. 18, no. 4, pp. 577–601, 2013.