A promising approach is so-called Foundation Models, which develop a broad understanding of actions and environments through their extensive pre-training. However, the performance of these models strongly depends on the quality of the training data. Human demonstrations, often used as a data basis, frequently exhibit inconsistencies and inaccuracies that complicate the fine-tuning of robot capabilities.
An innovative method to solve this problem is RLDG (Reinforcement Learning Distilled Generalists). This approach uses Reinforcement Learning (RL) to generate high-quality training data for Foundation Models. Instead of training robots directly through human demonstrations, RL agents are initially employed, which independently learn optimal action sequences through reward maximization. The data collected by these RL agents is then used to fine-tune the Foundation Models.
RLDG offers several advantages over conventional training with human demonstrations. First, it enables the automated generation of large amounts of high-quality training data, eliminating the need for human teleoperation. Second, RLDG combines the optimization capabilities of RL with the generalization ability of Foundation Models, leading to higher performance and better adaptation to new scenarios. Third, RLDG offers a solution for complex, multi-stage tasks by using RL data specifically for critical sub-steps that impair overall performance.
In extensive experiments with precise manipulation tasks, such as inserting plugs and assembly operations, Foundation Models trained with RLDG showed significantly better results than models based on human demonstrations. Success rates were increased by up to 40%, especially when generalizing to new, unknown tasks. A detailed analysis revealed that this performance improvement is due to optimized action distributions and improved coverage of the state space.
The experiments also showed that RLDG achieves comparable or better results with significantly less data than training with human demonstrations. To achieve the same performance, 6-10 times more human demonstrations would be required. For particularly complex tasks, such as the precise insertion of plugs, RLDG achieved a 100% success rate, while models with human demonstrations stagnated at 90% even with significantly more data.
RLDG can also be flexibly combined with human demonstrations. For complex, multi-stage tasks, RL data can be used for critical sub-steps, while human demonstrations remain sufficient for other sections. This combination allows for optimal use of both data sources and maximizes overall performance.
RLDG opens promising perspectives for robotics. By using Reinforcement Learning to generate training data, Foundation Models can be trained more efficiently and effectively. This synergy enables the development of robot systems that perform complex tasks precisely, adapt flexibly to new situations, and simultaneously reduce the effort for data acquisition.
The combination of Foundation Models and Reinforcement Learning through RLDG represents a significant advancement in the field of robotics. This approach enables the development of robust and flexible robot systems capable of handling complex tasks in the real world. The automated data generation through RL reduces the need for human demonstrations and opens new possibilities for the development of autonomous robots.