New Deep Reinforcement Learning Technique Helps AI Evolve



Hundreds of millions of years of evolution have produced a variety of life forms, each intelligent in its own way. Each species has evolved to develop innate skills, learning abilities, and a physical form that ensures survival in its environment.

But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and merging them after the development process. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest of life forms.

In a new paper published in the scientific journal Nature, AI researchers at Stanford University are presenting a new technique that can help overcome some of these limits. Called “deep evolutionary reinforcement learning”, or DERL, the new technique uses a complex virtual environment and reinforcement learning to create virtual agents that can evolve both in their physical structure and their learning capacities. The findings may have important implications for the future of AI and robotics research.

Evolution is difficult to simulate

In nature, the body and the brain evolve together. Across many generations, each animal species has gone through countless cycles of mutation to develop limbs, organs and a nervous system to support the functions it needs in its environment. Mosquitoes are equipped with thermal vision to detect body heat. Bats have wings to fly and an echolocation device to navigate dark spaces. Sea turtles have fins for swimming and a magnetic field detection system to travel very long distances. Humans have an upright posture that frees their arms and allows them to see the distant horizon, nimble hands and fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers out there. planet.

Interestingly, all of these species descend from the first life form that appeared on Earth billions of years ago. Based on the selection pressures caused by the environment, the descendants of these early living things evolved in many directions.

Studying the evolution of life and intelligence is interesting, but reproducing it is extremely difficult. An AI system that would like to recreate intelligent life in the same way evolution did would have to look for a very large space of possible morphologies, which is extremely computationally expensive. It would take a lot of cycles of parallel and sequential trial and error.

AI researchers use several shortcuts and predefined features to overcome some of these challenges. For example, they correct the architecture or physical design of an AI or robotic system and focus on optimizing the learning parameters. Another shortcut is the use of Lamarckien rather than Darwinian evolution, in which AI agents pass on their learned parameters to their descendants. Another approach is to separately train different AI subsystems (vision, locomotion, language, etc.) and then assemble them into a final AI or robotics system. While these approaches speed up the process and reduce the costs of training and evolving AI agents, they also limit the flexibility and variety of results that can be achieved.

Deep learning by evolutionary reinforcement

In their new work, the Stanford researchers aim to bring AI research closer to the real evolutionary process while keeping costs as low as possible. “Our aim is to elucidate some principles governing the relationships between environmental complexity, evolved morphology and the ability to learn intelligent control,” they wrote in their article.

In DERL, each agent uses deep reinforcement learning to acquire the skills necessary to maximize their goals over their lifetime. DERL uses Darwinian evolution to search morphological space for optimal solutions, which means that when a new generation of AI agents is spawned, they inherit only physical and architectural traits from their parents (with slight mutations). None of the learned parameters are passed from one generation to the next.

“DERL opens the door to performing large-scale in silico experiments to provide scientific insight into how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and learning control tasks, ”the researchers wrote.

Simulate evolution

For their framework, the researchers used MuJoCo, a virtual environment that provides highly accurate physical simulation of rigid bodies. Their design space is called Universal Animal (Unimal), in which the goal is to create morphologies that learn tasks of locomotion and manipulation of objects on a variety of terrains.

Each environmental agent is made up of a genotype that defines its members and joints. The direct descendant of each agent inherits the genotype of the parent and undergoes mutations that can create new members, remove existing members, or make small changes in characteristics, such as degrees of freedom or member size.

Each agent is trained with reinforcement learning to maximize rewards in various environments. The most basic task is locomotion, in which the agent is rewarded for the distance he travels during an episode. Agents whose physical structures are better suited to crossing the terrain learn more quickly to use their limbs to move around.

To test the results of the system, the researchers generated agents in three types of terrain: flat (FT), variable (VT), and variable terrain with modifiable objects (MVT). The flat terrain exerts less selection pressure on the morphology of the agents. Variable terrains, on the other hand, force officers to develop a more versatile physical structure that can climb slopes and navigate obstacles. The MVT variant presents the added challenge of requiring agents to manipulate objects to achieve their goals.

The advantages of DERL

Above: Deep evolutionary reinforcement learning generates a variety of successful body types in different environments.

Image Credit: TechTalks

One of the interesting discoveries of DERL is the diversity of the results. Other approaches to evolutionary AI tend to converge towards a solution, as new agents directly inherit the physical and the learnings of their parents. But in DERL, only morphological data is transmitted to the descendants; the system ends up creating a diverse set of successful body types, including bipeds, tripeds and quadrupeds with and without arms.

At the same time, the system shows features of the Baldwin effect, which suggests that agents who learn faster are more likely to reproduce and pass their genes on to the next generation. The DERL shows that evolution “selects faster learners without any direct selection pressure to do so,” according to the Stanford article.

“Interestingly, the existence of this morphological Baldwin effect could be exploited in future studies to create embodied agents with lower sample complexity and higher generalization capacity,” the researchers wrote.

Finally, the DERL framework also validates the hypothesis that more complex environments will give rise to smarter agents. The researchers tested the evolved agents on eight different tasks, including patrolling, evasion, object manipulation, and exploration. Their results show that in general, agents who have evolved on variable terrain learn faster and perform better than AI agents who have only experienced flat terrain.

Their results appear to be in line with another hypothesis from DeepMind researchers that a complex environment, appropriate reward structure, and reinforcement learning can eventually lead to the emergence of all kinds of intelligent behaviors.

AI and robotics research

The DERL environment has only a fraction of the complexities of the real world. “While DERL allows us to take a significant step forward in scaling the complexity of scalable environments, an important line of future work will be to design scalable environments that are more open, physically realistic and multi-agent,” wrote the researchers.

In the future, the researchers plan to expand the range of assessment tasks to better assess how agents can improve their ability to learn behaviors relevant to humans.

The work could have important implications for the future of AI and robotics and push researchers to use exploration methods much closer to natural evolution.

“We hope that our work will encourage further large-scale explorations of learning and evolution in other contexts to bring new scientific knowledge about the emergence of rapidly learnable intelligent behaviors, as well as new technical advances in our ability to instantiate them in machines “, explain the researchers. wrote.

Ben Dickson is a software engineer and founder of TechTalks. He writes about technology, business and politics.

This story originally appeared on Bdtech Copyright 2021


VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member



About Author

Comments are closed.