This is an early status update on my current project for Advanced Deep Learning for Robotics. The broader question is whether tactile information can make online adaptation more useful for in-hand manipulation.

The project is inspired by RMA-style adaptation: train with access to hidden information about the environment, then learn to infer a useful latent representation from recent history at test time. The comparison we eventually care about is simple to state: if the robot only gets proprioceptive history, how much can it adapt, and what changes when we also give it tactile or contact information?

Hand-like MuJoCo manipulation setup

A more hand-like manipulation setup is the direction I care about. The current milestone is deliberately simpler, because we first need a controlled baseline where contact-based object rotation actually works.

The Current Setup

Right now the environment is a MuJoCo sandbox with four controllable spheres around a box. The spheres act as simplified contact points rather than realistic fingers. That keeps the embodiment simple enough to debug while still forcing the policy to solve a contact-based manipulation problem.

The current task is planar object reorientation. The policy should rotate the box toward a 90-degree yaw target while keeping the object reasonably close to the center.

Simple 90-degree planar yaw setup

The environment is connected to a Gymnasium-style RL loop and trained with PPO. At this stage, the important part is not that the setup looks like a final robotic hand. It is that the full loop is now connected: simulation, control, reward design, training, rollout, and evaluation.

The First Working Milestone

The first real milestone is that the learned policy can already rotate the object toward the target. That sounds small, but for this project it matters. Before adding adaptation modules or tactile comparisons, the baseline policy has to produce useful contact behavior in the first place.

One rollout from the current policy. The task is the same 90-degree planar yaw rotation shown above: use the four contact spheres to turn the box toward the target orientation. In the video, the object does not just move randomly around the table. The policy has learned to use contact in a way that produces the rotation we want.

This is not a finished manipulation system. Drift control is still rough, and the current setup is still a sandbox. But it is the first version where the important pieces are working together: the MuJoCo environment, the action interface, the reward, PPO training, rollout, and evaluation all connect to one visible behavior.

Why This Baseline Matters

The tempting mistake would be to dismiss the four-sphere setup as only a toy environment. I do not think that is the right way to look at it.

For the next part of the project, we need a place where hidden physical variation can matter. If the policy could not first rotate the object in the normal setting, then testing adaptation would mostly test whether the basic task works at all.

Now the baseline is good enough to ask the next question: once mass, friction, size, or contact behavior changes, can the policy adapt instead of breaking?

What Comes Next

The next step is to introduce hidden variation in the environment. Once object or contact properties vary across episodes, the policy has a reason to adapt online instead of learning one fixed behavior.

After that, the project can move toward the RMA-style pipeline: privileged information during training, a latent representation of hidden properties, and an adaptation module that predicts that latent from recent history.

The comparison I am most interested in is still open: proprioception-only history versus proprioception plus tactile or contact history. The tactile signal might start simple, such as binary contact or contact forces, before trying anything richer.

So the current result is not the final answer. It is the foundation for asking the actual question in a cleaner way: once the object and contact conditions change, does touch help the policy adapt?