CoRI: Communication of Robot Intent for Physical Human-Robot Interaction

CoRL 2025

1Robotics Institute, Carnegie Mellon University
2Honda Research Institute USA
main figure

Abstract

Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot's upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot's image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot's high-level intentions but also crucial details about its motion any necessary user cooperation.

Overview Video

CoRI Pipeline Details


Overview: Our CoRI pipeline takes as input an image observation of the environment and person, along with a planned 3D trajectory of any robot. CoRI first performs interaction-aware trajectory encoding to extract body landmarks, segment the trajectory, and visually encode the planned motion into visual context. Finally, it queries a VLM and a reasoning LLM respectively to interpret the visual information and generate user-oriented verbal communication. CoRI is designed to be task-agnostic, hence not requiring any task-specific information, and allows any robot to communicate to a non-expert user in the following three aspects:
  • The overall intention of the robot. This is the task-level goal that the robot aims to achieve.
  • All aspects of the motion of the robot. This includes details regarding the planned trajectory itself—starting and ending positions, shape of the trajectory, as well as velocity and force profiles. Furthermore, these descriptions are intuitive—without any numerical values, but with references to the robot's surroundings and with comparative language.
  • If needed, user cooperation. We recognize that many assistive tasks require certain behaviors from the user in order for the overall interaction to be successful (e.g. user taking a bite during feeding). Hence, our pipeline also conveys any desired human behavior.

Application in pHRI Tasks

Bathing



Shaving



Feeding

Generated communications