RMA-Style Adaptation for In-Hand Manipulation

This is a status update on my current project for Advanced Deep Learning for Robotics. The broader question is still whether tactile information can make online adaptation more useful for in-hand manipulation.

The project is inspired by RMA-style adaptation: train with access to hidden information about the environment, then learn to infer the useful parts of that information from recent history at test time. The comparison I eventually care about is simple to state: if the controller only gets proprioceptive history, how much can it adapt, and what changes when we also give it tactile or contact information?

Hand-like MuJoCo manipulation setup

A more hand-like manipulation setup is the direction I care about. The current environment is still simpler than that, but it is now doing the part that matters for the next experiments: contact-based object rotation in simulation.

The Current Setup

Right now the environment is a MuJoCo sandbox with four controllable contact points around a box. They are not meant to be realistic fingers. They are a controlled way to study the manipulation problem before moving to a more hand-like embodiment.

The task has also changed from the first milestone. Instead of rotating toward one fixed target orientation, the policy now tries to keep rotating the object continuously. The object is manipulated in an in-air setup rather than mostly resting on the floor, and it is allowed to move in the plane while the controller keeps the interaction useful.

The environment is connected to a Gymnasium-style RL loop and trained with PPO. The reward is intentionally simple: make yaw progress while avoiding obviously bad behavior such as dropping the object. That makes the setup easier to reason about when adding adaptation.

What Works Now

The current policy can produce sustained contact-based rotation. That is the main thing I wanted from this stage. Before the adaptation question becomes meaningful, the base behavior has to work well enough that there is something to adapt.

One rollout from the current setup. The important point is not that this is a realistic hand yet. It is that the policy has learned to use the simplified contacts to keep rotating the object.

This is still a sandbox. But it is no longer just “can PPO make anything move?” The setup now gives us a usable behavior where state estimation and adaptation can actually be tested.

Where RMA Comes In

The RMA-style part has started to become more concrete. The policy can be trained with privileged state information such as the object’s position and yaw. Then a separate adaptation module tries to infer part of that privileged state from recent proprioceptive history.

In the current version, the adaptation module predicts the object’s XYZ state from proprioceptive history. Yaw is still provided in the experiment. That is not the full tactile comparison yet, but it is an important step: the controller is starting to rely on estimated state instead of only directly available privileged information.

I like this stage because it makes the problem more concrete. It is not just an abstract plan to “add RMA later” anymore. There is now a teacher-style privileged policy, data collection from rollouts, an estimator, and an evaluation path where predicted state can be used by the policy.

What Is Still Open

The tactile question is still open. Right now the estimator uses proprioceptive history. The next interesting comparison is what happens when the history also includes contact information, such as contact bits or contact forces.

There is also still the usual robotics gap between a useful sandbox and a convincing manipulation system. The current setup is deliberately simplified. That is fine for now, because it gives us a clean place to ask whether online adaptation helps before making the embodiment more complicated.

So the project has moved from a first rotation baseline to a more useful question: once the controller can rotate the object continuously, can an adaptation module infer enough about the state from history, and does touch make that inference better?

The Current Setup#

What Works Now#

Where RMA Comes In#

What Is Still Open#

The Current Setup

What Works Now

Where RMA Comes In

What Is Still Open