ADAPT: Analytical Disturbance-Aware Policy Training for Humanoid Locomotion

Bofan Lyu1,* Jindou Jia1,* Kuangji Zuo1 Yanshuo Lu1 Shijia Han1 Gen Li1 Boyu Ma1 Jingliang Li1 Geng Li1 Jianfei Yang1,†
1MARS Lab, Nanyang Technological University
*Equal contribution Corresponding authors
MARS Lab logo Nanyang Technological University logo

ADAPT makes hidden external forces visible to humanoid policies.

With sensorless, analytical joint-level disturbance estimates, it improves disturbance robustness to pushes and asymmetric payloads, and supports task-specific reward shaping for behaviors such as light-step locomotion.

Overview of ADAPT

Overview of ADAPT

Overview of ADAPT. ADAPT infers whole-body disturbance torques from accessible robot dynamics, proprioception, and expected motor torques. The estimated disturbances are fed into the policy as explicit observations and can also be used for reward shaping, enabling push resistance, payload compensation, and light-footed locomotion.

Performance of ADAPT

We first present hardware demonstrations and paired simulation/real-robot comparisons, where the disturbance-aware policy improves physical interaction behavior under pushes, pulls, asymmetric payloads, and foot-ground impacts.

Stance Pulling Test on Hardware

Torso Push
Shoulder Push
Torso Pull

Baseline

ADAPT

Stance pulling test. Baseline is shown in the upper row and ADAPT in the lower row. The robot receives a zero-velocity command while an operator applies external forces to its torso and shoulder. Compared with the proprioception-only baseline, the ADAPT-trained policy shows stronger resistance to the applied forces and adjusts its whole-body posture to counteract the perturbation, indicating improved whole-body robustness during physical interaction.

Asymmetric Hand Loading Test

We evaluate ADAPT under one-sided hand loads that create an asymmetric disturbance moment during standing and forward walking.

Baseline
ADAPT

4 kg, out-of-distribution load

Static postural response. Baseline is shown first and ADAPT second. With an out-of-distribution 4 kg payload attached to the right hand, the proprioception-only baseline is pulled toward the loaded side, resulting in a visibly unbalanced upper-body posture. ADAPT tilts the torso away from the loaded side to compensate for the moment caused by the load.

Simulation

Baseline
ADAPT

Real Robot

Baseline
ADAPT

Walking under asymmetric hand loading. In each panel, Baseline is shown above ADAPT. In simulation, a constant downward force is applied to the right hand during forward walking; on hardware, the condition is reproduced with a physical payload attached to the same hand, with 4 kg lying outside the training distribution. In both settings, the trajectory panels report lateral drift over forward distance: ADAPT maintains a straighter walking trajectory, while the baseline exhibits larger lateral drift.

Disturbance-derived reward shaping

Beyond serving as policy observations, ADAPT's analytical residuals can provide direct training feedback for shaping how the robot interacts with the ground.

Light-step Reward Shaping

For light-step locomotion, we penalize high-impact leg-joint disturbance residual peaks during touchdown, encouraging lighter lower-body loading while preserving the support forces needed for walking.

Light-step reward shaping residual envelope
Baseline
ADAPT

Light-step reward shaping. The plots show the scaled leg disturbance residual envelope over time in simulation and on the real robot, while the snapshots compare touchdown behavior. The residual envelope can be read as a compact measure of how strongly the legs are disturbed during foot-ground contact; lower values indicate lighter leg loading and smaller foot-ground impact. Rather than merely changing walking speed or step length, the reward changes the contact transition: the baseline lands flatter and more abruptly, producing a shorter loading phase and a stiffer, stomping gait, while ADAPT rolls into support more progressively and reduces visible foot-slap.

Listen for the difference in footstep impact.

Baseline
-- dBFS
-- dBFS
Baseline +6.3 dB average RMS; +5.2 dB at footstep-impact p95.
ADAPT

Live footstep sound comparison

Live loudness Original video audio Synced comparison

Full-speed footstep audio. Both the audible comparison and the synchronized audio trace show that ADAPT lands more quietly, with lighter footstep impacts than the baseline.

Simulation robustness tests

We then evaluate the same disturbance-aware training idea in simulation, where force direction and magnitude can be swept systematically.

Torso Pulling Test in Simulation

We evaluate whether ADAPT can track commanded forward walking under sagittal torso forces, including OOD force magnitudes beyond the training range.

Torso pulling line charts
Baseline
ADAPT

Backward 60 N torso force · vx=1.1 m/s

Left: the plots summarize velocity tracking error, lateral drift, and yaw drift under sagittal torso forces across command velocities and force magnitudes. Right: the videos show Baseline first and ADAPT second under an OOD 60 N backward torso force at vxcmd = 1.1 m/s; the baseline is pulled backward, whereas ADAPT keeps walking forward.

Torso Perturbations from Different Directions

We further test torso perturbations from different horizontal directions to examine whether the learned policy remains robust beyond a single force direction.

Radar plots for torso perturbations from different horizontal directions
Baseline
ADAPT

Right lateral 40 N torso force · vx=1.1 m/s

Left: the radar maps summarize velocity-tracking results under torso perturbations applied from different horizontal directions. Right: the videos show Baseline first and ADAPT second under a right lateral 40 N torso force at vxcmd = 1.1 m/s; compared with the baseline, ADAPT maintains better locomotion under the side perturbation.

Observer Performance

Finally, we validate the analytical disturbance observer that ADAPT uses to produce structured residual information.

Sensorless disturbance estimation

Estimates residual force/torque online with accessible robot dynamics, without requiring force/torque sensors.

Physics-derived disturbance prediction

Grounded in physics rather than data, the prediction supports generalization beyond the training disturbance distribution.

Momentum observer

Avoids the noisy acceleration term by using generalized momentum and accessible nominal dynamics terms.

Structured force-related observation

Feeds the filtered, effort-scaled residual to the actor as a force-aware proprioceptive signal.

Observer performance plot
Performance of the disturbance observer on the Unitree G1. While the robot walks forward, known disturbance torques are injected into the left shoulder pitch, left elbow, right shoulder pitch, and waist pitch joints. Dashed curves denote the injected torques, and solid curves show the observer estimates in simulation and on hardware.

BibTeX

@misc{lyu2026adaptanalyticaldisturbanceawarepolicy,
      title={ADAPT: Analytical Disturbance-Aware Policy Training for Humanoid Locomotion},
      author={Bofan Lyu and Jindou Jia and Kuangji Zuo and Yanshuo Lu and Shijia Han and Gen Li and Boyu Ma and Jingliang Li and Geng Li and Jianfei Yang},
      year={2026},
      eprint={2606.16542},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}