Adaptive Industrial Humanoid for Curiosity & Bonding RL

Adaptive Industrial Humanoid for Curiosity & Bonding RL

A research driven humanoid framework integrating curiosity based reinforcement learning, human feedback shaping, and adaptive behavior for trust-safe industrial collaboration.

HumanoidReinforcement LearningIntrinsic MotivationCuriosity-Driven AIHuman-Robot CollaborationSimulationROS2PerceptionBehavior LearningControl SystemsEmbodied AIMotion GenerationCognitive Robotics

System Architecture

System Diagram

Build Log & Details

Overview

This ongoing humanoid robotics research framework investigates how curiosity, purpose-biased intrinsic motivation, and human feedback (“bonding” signals) can enable a humanoid robot to adapt, refine, and align its behavior within dynamic industrial environments.

Unlike scripted control pipelines, this humanoid explores and learns through novelty, human interaction cues, and an internal behavioral trait vector that shapes how it perceives, explores, and collaborates.

This framework forms the foundation of my Master’s thesis:

“Curiosity and Bonding Driven Reinforcement Learning for Adaptive Humanoid Collaboration in Industrial Environments.”


Core Idea

Industrial humanoids must handle:

  • Constantly shifting workspaces
  • Unpredictable human movements
  • Ambiguous human cues
  • Exploration without risking safety
  • Trust building through behavior

This research integrates a four-part adaptive RL system:

1. Intrinsic Curiosity (ICM/RND)

Internal rewards for surprising or informative transitions, enabling exploration beyond hard coded behaviors.

2. Purpose-Biased Motivation Prior

A task domain bias that aligns curiosity with industrial affordances tools, equipment, workspace geometry, rather than random actions.

3. Bonding Engine (Human Feedback)

Human gestures, voice cues, confirmations, or corrections are transformed into social reward shaping, promoting intuitive collaboration.

4. Inherited Trait Vector (ITV)

A compact behavioral profile that influences risk tolerance, approach distance, proactivity, and exploration style.

Together, these allow the humanoid to:

  • Explore more intelligently
  • Adapt RL policies through human cues
  • Maintain safe behavior
  • Refine industrial task skills over time
  • Synchronize with human collaborators

System Architecture

Hardware & Simulation Bodies

This framework spans three humanoid platforms:

This enables scalable testing across locomotion, manipulation, perception, and social feedback conditions.


Software Stack

  • ROS 2 Humble – distributed control and modular nodes
  • PyTorch – RL agents, curiosity models, and ITV learning
  • Gazebo / MuJoCo / Isaac sim – full humanoid simulation
  • OpenCV – saliency, stereo depth, gesture cues
  • Audio models – speech & cue detection for bonding signals
  • C++/Python ROS2 nodes – perception, reward shaping, control graphs

Subsystems Developed

Curiosity Engine

ICM/RND with industrial-purpose weighting to avoid meaningless exploration.

Bonding Engine

Gesture/voice processing → reward pulses for human-aligned decision shaping.

Inherited Trait Vector (ITV)

A trait layer driving risk posture, exploration, proactivity, and cooperation tendencies.

Perception Layer

  • Stereo depth
  • Attention/saliency
  • Workspace monitoring
  • Gesture recognition
  • Voice/intent detection

Control & Behavior Layer

  • Action primitives
  • Safety-first control graphs
  • Policy-conditioned motion generation
  • Interpretable behavior models

Current State of Research

Completed:

  • Curiosity–bonding–ITV RL architecture
  • Saliency + stereo depth perception stack
  • Gesture/voice feedback reward injection
  • Early RL testing through Poppy & Unitree models
  • Modular CAD for custom humanoid prototype
  • ROS2 ↔ PyTorch policy interface

Next phase: closed-loop RL experiments integrating curiosity, bonding, and ITV for task acquisition.


Why This Research Matters

Industrial humanoids need:

  • Autonomy
  • Adaptive behavior
  • Perception-driven intelligence
  • Safe exploration
  • Trust-building

This framework demonstrates how RL, curiosity, perception, and human feedback can unify into a single architecture for future industrial humanoid coworkers.

It integrates cognitive robotics, social signal processing, and reinforcement learning into a practical, scalable framework.


References

  1. Du, Y. et al., 2025. Learning Human-Humanoid Coordination for Collaborative Object Carrying.
    https://arxiv.org/abs/2510.14293

  2. Kerzel, M. et al., 2023. NICOL: A Neuro-inspired Collaborative Semi-humanoid Robot.
    https://arxiv.org/abs/2305.08528

  3. Puig, X. et al., 2023. Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots.
    https://arxiv.org/abs/2310.13724

  4. Pathak, D. et al., 2017. Curiosity-driven Exploration by Self-supervised Prediction.
    https://arxiv.org/abs/1705.05363

Gallery

media-0
media-1
media-2
media-3
media-4
← Back to Portfolio