Intent-Aware Predictive Haptic Guidance and its Application to Shared Control Teleoperation

This paper presents a haptic shared control paradigm that modulates the level of robotic guidance, based on predictions of human motion intentions. The proposed method incorporates robot trajectories learned from human demonstrations and dynamically adjusts the level of robotic assistance based on how closely the detected intentions match these trajectories. An experimental study is conducted to demonstrate the paradigm on a teleoperated pick-and-place task using a Franka Emika Panda robot arm, controlled via a 3D Systems Touch X haptic interface. In the experiment, the human operator teleoperates a remote robot arm while observing the environment on a 2D screen. While the human teleoperates the robot arm, the objects are tracked, and the human’s motion intentions (e.g., which object will be picked or which bin will be approached) are predicted using a Deep Q-Network (DQN). The predictions are made considering the current robot state and baseline robot trajectories that are learned from human demonstrations using Probabilistic Movement Primitives (ProMPs). The detected intentions are then used to condition the ProMP trajectories to modulate the movement and accommodate changing object configurations. Consequently, the system generates adaptive force guidance as weighted virtual fixtures that are rendered on the haptic device. The outcomes of the user study, conducted with 12 participants, indicate that the proposed paradigm can successfully guide users to robust grasping configurations and brings better performance by reducing the number of grasp attempts and improving trajectory smoothness and length.


I. INTRODUCTION
Collaborative robotics is an emerging field, thanks to its capability of putting robots in close distance to humans, and thus boosting the range of applications across domains. For example, significant contributions can be foreseen in enabling robots in hazardous environments [1], in minimally invasive surgery [2], operating in space [3] and for underwater exploration [4]. Although such robots have transformative capability by delivering beyond human performance through highly precise, rapid and stable movements in far more degrees of freedom than the human hand, fully autonomous solutions are not feasible due to environmental uncertainties and complex decision requirements of the tasks. This brings forward a need for developing shared control systems [5], [6], [7], where both humans and robots may benefit from one another's capabilities by dynamically and appropriately assigning roles to the human and the robot [8], [9]. For example, shared control can be applied on a teleoperated nuclear waste sorting scenario, in which the human may be given the control authority to decide how the heap should be manipulated, whereas the robot can support the human by completing mechanical actions in a smooth and precise way.
Even though robots are not yet capable of matching human abilities in real world scenarios, in recent years, we witnessed a rise of robot learning methodologies, allowing robots to be programmed to learn from humans and generalise expert knowledge on motor tasks [10], [11], [12]. The current study leverages robot learning to integrate human intent awareness with robot kinematic planning within a sharedcontrol context. We propose a haptic guidance framework to predict human intent, and based on this prediction, dynamically modulate the level of assistance to enact robot behavior for reaching dynamic objects and placing them in static bins. Human and robot autonomy are combined to appropriately guide teleoperators, so that no excessive or misleading constraints are put on human operation. This is done through an effective blend of human and robot capabilities, where the robot continuously monitors human operation to infer about the intended targets, and guides the human toward feasible robot trajectories.
In order to study the proposed paradigm, we created a teleoperated pick and place scenario. Figure 1 shows the human's view of the scene, where a haptic device is used to teleoperate a simulated robot arm, while the human observes the scene through a screen interface. Robot trajectories are pre-learned using Probabilistic Movement Primitives (ProMPs) to program a library of robot behaviours, which are then conditioned to modulate the motion to changing object configurations. A Deep Q-Network (DQN) is trained to predict the human intent based on the robot's end-effector pose. DQN is used to streamline the intent detection and generate rewards in real-time to modulate the level of robot assistance. Once the human intent is detected, the robot reactively responds to the intent, which is transferred to the human in the form of kinesthetic guidance. This intentaware guidance paradigm allows humans the flexibility to freely move around the scene and change targets as they desire. In comparison with manual operation, the proposed method improves task performance by significantly reducing the number of grasp attempts when picking objects and improving motion smoothness and trajectory length without incurring extra energy.

II. RELATED WORK
The detection of human intention during physical interaction is not a straightforward problem. In this work, we adopt a high level definition and consider intent as a target configuration that a human aims to reach. This definition is applicable to many robotic manipulation tasks, including teleoperated picking and placing of objects. Previously, Goodrich and Jr proposed a memory-based approach to infer about user's goals based on their intent history [13]. However, this approach depended greatly on data, which is hard to gather in interactive robotic tasks and may be affected by the personal perspectives when labelling intentions.
Humans can communicate intent through haptic cues [14], although such mechanisms are hardly applied in robotics. Moertl et al. has demonstrated the use of force-related information to infer about immediate human plans, and used this information to arbitrate robot autonomy [15]. Similarly, Medina et al. investigated the human intent to change the direction of motion [16]. Aydin et al. demonstrated the use of a fuzzy intent estimator and integrated it via a variable impedance controller to adjust robot autonomy [17]. An intent detection scheme which uses principle of maximum entropy over trajectories was proposed in [18]. Later, this approach was modified using Bayesian filtering in a Markov model under a probabilistic framework [19] for sharedcontrol operation. The predictions made in this method were based on distance to closest goal or based on history of user inputs towards the closest goal. Hence, these approaches may fail when the goals are too distant or too close when inferring about intentions. In contrast, our method predicts the human intent towards the closest legible trajectories to reach goals, learned directly from human demonstrations.
Another major issue in a human-robot scenario is the programming or robot behaviours. For example, in our sorting task, we have multiple activities, such as picking and moving coloured objects and placing them in a bin with the same colour. Even though this is a simple scenario, a robot needs decision making and planning. A robot, when placed in front of such a scene would not even be capable of distinguishing between objects and bins. A human, on the other hand, could quickly develop this understanding, which can be transferred to the robot by teaching skills to handle the scene. Our intent detection method relies on learned models of previously executed task motions. We propose the combination of intent detection and Learning from Demonstration (LfD) for motion planning as an end-to-end paradigm for robot assistance in shared control human-robot systems. LfD is an easy-to-use approach to transfer human skills into robot motions through trajectory representations learned directly from human operations [20], [10]. LfD has been used as a popular method to support collaborative robots in various industrial applications [21], [22]. Popular LfD approaches use trajectory representations by Gaussian Mixture Regression (GMR) [23], Gaussian processes (GPs) [24] or Movement Primitives (MPs). Among these, MPs have greater potential to generalize motions to different goals using concise and simple representations with small time and memory complexity. Due to the simplicity of representation and adaptation power, in our work we use MPs known as Probabilistic Movement Primitives (ProMPs) [25] to represent trajectories as weighted virtual fixtures to guide human motion, while adapting to dynamic object motion in the scene. We address the intent detection using reinforcement learning (RL) to select between multiple trajectories by computing a reward to match the nearest trajectory to the current robot end-effector pose. The use of RL enables faster matching in comparison to exhaustive search over all the phase space of all trajectories (minimising the distance to each phase of each trajectory), especially with a large number of objects. Combining artificial neural networks and RL, Deep Q-Networks (DQN) offer a model-free goal-oriented approach to solve RL problems in continuous environments.
The benefits from the mental abilities of a human operator, and the precision and reliability of a robot system can be simultaneously achieved through shared control in a teleoperation scenario [26], [27]. Recently, [28], [29] investigated shared control through optimization techniques to infer user's goals, in turn, to provide assistance. A teleoperation sharedcontrol architecture utilising learned trajectory properties for assistance is described in [30] for reaching tasks, however, without considering the intent of the operator. As demonstrated [6], intent recognition can be effectively used to choose the appropriate level of shared control autonomy. Similarly, the choice of control autonomy in our current work is framed as a role distribution problem [31], [32], where human and robot autonomy are balanced by choosing an appropriate level of robot autonomy, meanwhile selecting the best way to guide the human. Recently, Ewerton et al. introduced a framework using a combination of Gaussian mixture models (GMM) and ProMPs, to infer best trajectories when reaching for a specific target [33]. This study used RL to model virtual guidance fixtures as potential fields matching the operator's intent to reach a single known goal. Our work is essentially different as we do not learn alternative ways to reach a target, but in contrast, we select between multiple trajectories to reach a number of targets in a complex environment, also accommodating changing object configurations. In addition, our work enables dynamic autonomy in a novel way by weighting the guidance trajectories  Fig. 2. a) The overview of the proposed framework to enable haptic control of the Panda arm. The bottom block presents the components of the predictive haptic guidance mechanism including trajectory planning, intent detection and adaptive guidance. (b) The experimental scenario where the remote robot arm is teleoperated to sort objects according to color.
depending on the confidence of prediction for the intent.

III. METHODOLOGY
A. Control architecture Figure 2a represents an overview of the proposed paradigm. The system is designed for bilateral teleoperation, such that the remote arm will mirror the end-effector pose of the haptic control interface, while feedback forces are rendered on the haptic device and felt by the human. The upper block of Figure 2a describes the control loop where we use inverse kinematics to compute the desired robot joint angles based on the end-effector position of the haptic device. The contact force (F contact ) is computed using the joint torque readings captured by the built-in torque sensors at each joint of the Franka Emika Panda arm, implementing the force feedback architecture proposed by Singh et al. [34]. These torques are multiplied by the Jacobian to compute the end-effector torque F ef f , which is scaled by a factor of 0.0002 to compute F contact , so that the forces can be safely rendered by the haptic device. The proposed predictive haptic guidance paradigm (Figure 2a, bottom block) generates appropriate guidance forces (F guidance ) in a shared control context. It is composed of trajectory planning, intent detection and adaptive guidance components, which are detailed in the rest of this section.
Note that F contact and F guidance are never simultaneously rendered on the haptic device, as that could have created conflicts in the perceived forces. In order to avoid such conflicting forces, feedback force (F f b ) is generated in the following manner:

B. Trajectory Planning
In this study, we use ProMPs to learn good robot trajectories from human demonstrations to reach and grasp a set of objects and sort them by color by placing them in static bins located in the scene (See Figure 1). The ProMP trajectories are learned prior to the experiments, and are considered to work as baseline robot plans, which are selected as appropriate to the estimated intent. Each trajectory is learned through 10 demonstrations, and encodes the skill in terms of how each object shall be grasped and moved to each bin.
In order to capture the coordinated movement of the joints, each ProMP model is learned using the robot joint angles q t over time, t ∈ {0 · · · T }. Time-varying variance of the trajectories from multiple demonstrations are captured using basis functions. Basis function representations significantly reduce the number of parameters learned for each phase of the learned motion, which is one greatest advantage of ProMPs to speed up the learning procedure.
The time-varying basis matrix Ψ t ∈ R D×KD , where D and K are the number of DoFs and the basis functions, respectively, is defined as follows: Here, each φ t ∈ R K is a basis vector that contains the normalized Gaussian basis functions where h is the width of the basis and c i is the center of the basis function i. The phase variable z is a monotonically increasing function of time, defined within the interval z t ∈ [0, 1], which enables temporal modulation of the trajectory. The basis functions φ i are thus defined on the phase instead of time.
The basis function matrix φ t is weighted by a vector w to represent the trajectory for the joint as where w ∈ R K . A Gaussian distribution over the weight vector w contributes to the variance of the trajectory for each timestep, and can be formulated using parameters θ = {µ w , Σ w } as

567
where µ w and Σ w are the mean and covariance of the learned distribution, respectively. Since we have the affine function represented in (4) for a single joint, the distribution of the states of all joints at time t using (2) can be obtained as follows (Refer to [25] for details) : This representation enables modulation through via points using ProMP conditioning. In our context, ProMP conditioning allows the robot to modify a trajectory for passing through new via points to accommodate changes in object configurations in the scene. The modified ProMP parameters (µ * w , Σ * w ) can be obtained by feeding joint angles q * to ProMP at a specific timestep t as follows: This is a required feature as the object poses can change during manipulation, such as if pushed around while picking. The conditioning guarantees a smooth trajectory that blends with robot's current behavior and allows precise task completion as shown in Figure 3. In this example, the trajectory is conditioned to pass through two points, one at midway (Figure 3a)   This section describes how intent detection is enabled in our shared-control framework. Consider that there is a list of goals in the environment, each with a corresponding ProMP trajectory formulated in (6). To estimate the human's motion intention, we use a Deep Q-Network (DQN), which takes robot's end-effector position as input and predicts the intended ProMP trajectory. In addition, an interpolated path is computed to reach the nearest phase on the intended ProMP trajectory. This is later blended with the conditioned ProMP trajectory to compute smooth guidance. Using the DQN allows the selection of the nearest trajectory quickly, without iterating over all phases in all ProMP trajectories. Besides, the long-term reward maximization facilitates adding further constraints such as obstacles and joint limits.
The DQN uses the reward function r, which is the negative Cartesian distance between the robot's current end-effector position s and the end-effector position at the nearest phase z of the intended ProMP trajectory p(q z ; θ): Here T is the forward kinematics transformation, which maps the joint space to Cartesian space. Since we are dealing with multiple goals and multiple plans corresponding to each of these goals, during training, the intent recognition engine continuously loops through all possible trajectories to avoid overlapping; hence enabling convergence to a single trajectory. The Q value is iteratively updated using a Bellman equation, which is described in (9). In this function, r t + γ max a Q(s t+1 , a) plays the role of the target value.
where Q(s, a) is the Q value for taking action a at state s, α is learning rate, r is the reward value, and γ is the discount factor. The DQN uses the loss function in (10) to perform gradient descent, where N is the number of samples, y is the Q target value and Q(s, a; θ) refers to the Q value of taking action a at state s given parameters θ.  We use experience replay to randomly pick samples from this memory to get the collected experience for training the network. The system contains two networks, which are similar in structure but different in parameters, known as target and prediction (also called evaluation) networks. The parameters of the prediction nets will transfer to the target one after a defined number of iterations. Figure 5 demonstrates how estimated intent changes with respect to Cartesian distance calculation over a period of time. The results are presented on a simple scenario, where the human moves freely in between four target objects (numbered 1 to 4) within a 3D scene. Respectively, four ProMP trajectories are learned to reach each object. The DQN model can detect the target that matches the nearest trajectory. Note that due to the ProMP representation, the learned trajectory distributions may overlap and hence the resulting intent may oscillate in case two trajectories are close to each other.
The intent recognition engine continuously computes a reward during motion, based on the current end-effector position. This reward is used to select the target object and the corresponding trajectory. A sequence of steps are computed to attract the human toward the nearest target trajectory phase. Figure 6 illustrates the operation of the intent detection engine in the example scene used in Figure  5. Here, the human starts the motion from point A, and moves to reach point B along the blue trajectory. At this point, the DQN model outputs the current intention as Object 2, and estimates the nearest phase on the corresponding ProMP trajectory to reach this object. The guidance action is computed to attract the human toward this trajectory (marked with the red dotted line between B and the ProMP trajectory). However, the human doesn't comply with guidance, and continues the motion to proceed to Point C along the blue trajectory. At C, the intent recognition engine estimates that the human's intent has changed to Object 4, hence guidance is provided to lead the human to the closest phase in the corresponding trajectory over the green dotted line.
In order to ensure teleoperators retain control over robotic autonomy, we implemented a mechanism to break robot guidance if the human doesn't choose to move toward the guided trajectory e.g. move to another direction within a given time window, (0.3s in our experiment). In this case, guidance forces are cancelled and the DQN will detect another target.

D. Adaptive Haptic Guidance
As the intended target is estimated, we generate two consecutive motion plans. The first is the output of the intent detection engine, and describes a path to snap to the closest phase on the indended ProMP trajectory. The second one is the ProMP trajectory conditioned with the (possibly changed) object pose. The robotic autonomy is programmed as haptic virtual fixtures guiding on these two trajectories. In order to avoid jerky changes in the guidance forces when starting and stopping the guidance, an exponential function is applied on the force profile to damp the forces.
As mentioned earlier, modulating the role switching behavior between robot autonomy and human control is important in shared control. If the robot is too persistent in guiding the human, the operation could become restraining for human decision taking. What is important to note here is that the intent detection engine estimates a path to the intended trajectory and this information can be used to capture the confidence about the correctness of the estimated intent. This is implemented regarding the number of steps required to snap to the estimated ProMP trajectory. Since this path leads the robot from its current position to the nearest phase on trajectory, a large number of steps indicate that the human is relatively far away from the robot's plan, thus the belief that the detected target is correct is weak. Accordingly, the adaptive guidance mechanism we propose weights the force guidance given through the haptic device by introducing a belief coefficient based on the distance to the intended ProMP trajectory. The belief coefficient will make sure that the engine doesn't put too much force towards one trajectory while the user is still moving undecidedly. As a result, if the user suddenly switches to a different direction, the force guidance will be switched off until the next detection.
Equation 11 denotes how we calculate our adaptive force, where k is the scaling gain (k = 9), N is the predefined threshold for the maximum number of steps to the ProMP trajectory to consider as user intention, r is the number of steps taken to the intended phase on the ProMP trajectory, s t+1 is the next expected end-effector position from s t on the computed trajectory.

IV. EXPERIMENTS
In order to evaluate the proposed framework, we conducted a user study in a teleoperated pick-and-place scenario. A 3D Systems Touch X haptic device is used to control a Franka Emika Panda robot arm with 2-finger grippers as the remote system. The participants monitored the operation of the remote arm through a 2D screen.
12 subjects (1 female, 11 male) aged between 23 and 38 from different academic backgrounds, participated in the study. 5 participants had no prior experience working with robotic arms and 3 had experience using haptic devices. At the beginning of the trials, the participants were given an instruction sheet providing the details of the experiment and the required task. A practice session is presented to allow participants to get familiar with the robot and teleoperation using the haptic device. The practice session was maintained until participants felt comfortable with the control interface. The participants signed an informed consent form at the beginning of the experiment. A full ethics approval (reference 2019-Jul-0802) was obtained from the Human Ethics Committee in the University of Lincoln, where the experiments took place in, and a full risk assessment was completed before the commencement of the studies. Appropriate automatic and manual safety measures were installed, including physical and software-based kill switches, and moderated by the experimenter to stop the haptic device in case of emergency.
Among 12 subjects, 2 were left-handed and all of them used their dominant hands to control the remote robot. The Franka fingers were controlled using the button on the haptic device stylus. The experimental scene is shown in Figure  2b, where four objects and two trays, colored either red or yellow, were placed within the workspace of the Panda robot. Participants were asked to sort each object to the trays with matching colors. The colors of the trays were constant, whereas the colors of objects were pseudo-randomly chosen (i.e. the same colors were selected in each trial, so all partipants experimented with the same object color sequence). The controller was agnostic to the colors; as a result, only the human could decide which object should go to which tray. The objects were located relatively close to each other, so that the intent detection was not straightforward. We tested two conditions within the experiment: • Manual mode: Users had complete control of the remote arm through the haptic device. They were provided with collision forces with objects and the scene. • Guided mode: In addition to collision forces, the users were also provided with guidance forces that led the haptic stylus over a learned trajectory to reach and place objects, using the proposed intent-aware predictive haptic guidance paradigm. A balanced within-subjects design was used in the experiment, so that all participants experimented with both conditions. The subjects were randomly allocated into two groups to eliminate ordering effects, where the first group experimented with the manual mode first, whereas the second group experimented with the guided mode first. Participants were not informed in advance which mode they experimented on. Each participant completed three trials in each condition, carrying out 12 picking and placing operations (3 trials × 4 objects in the scene) in total.
As the robot does not know which tray the objects should be sorted in, the high-level decisions were made completely by the human operators. In our experimental scenario, the predictive haptic guidance was provided during both picking and placing phases of the task. Object positions were tracked continuously to condition the ProMP trajectories. Objects that are already sorted were removed from the list of targets that were considered for intent detection. We recorded timestamped joint torques, positions and velocities during the experiment. The following metrics are used to evaluate the results of the user study: • Time: The total time to pick and place four objects is recorded for each trial. • Number of correctly sorted objects: The number of correctly sorted objects, i.e. those placed in a tray of the same color, are counted for each trial. • Number of grasping attempts: The number of times that the user pressed the haptic device button to close the Franka grippers is counted as the number of grasping attempts per trial. • Energy: We compute the energy consumption of the task by integrating the human exerted power over time as E = T t=0 τ (t)q(t), where τ andq are joint torques and velocities of the remote robot, respectively, and T is the duration of the trial. • Trajectory length: The total length of the trajectory in each trial is computed to measure trajectory complexity. • Trajectory smoothness: Spectral arc length [35] of robot joint motions is computed, and their average is used to measure motion smoothness at each trial. At the end of each condition block, the participants were given the NASA Task Load Index (TLX) [36] to assess their perceived workload on a 5-point Likert scale.
In order to investigate the statistical significance of observed differences between the conditions, we conducted t-  tests for the continuous level measures. Wilcoxon signedrank tests were conducted for ordinal level survey data. Figures 7 and 8 illustrate a comparison of the quantitative metrics. The results indicate no significant effect of the conditions (manual vs. guided mode) on task completion time and energy. On the other hand, a statistically significant difference is observed in terms of the trajectory smoothness (t(36) = 3.01, p < 0.005) and trajectory length (t(36) = 2.27, p < 0.05). This indicates that the subjects performed significantly more smoothly under guidance, and completed the motions in a more targeted and concise manner.
In terms of task performance, we observed that the subjects completed the sorting task almost perfectly (with the exception of a single object being misplaced in the manual mode over all trials) in both conditions. A statistical significant difference between conditions is observed in the number of grasping attempts with t(36) = 2.02 and p < 0.05. Figure 9 shows the results of the workload assessments under each condition. No statistical significant differences are observed for any of the subjective measures. Overall, the participants assessed the workload rather low in all dimensions of the NASA-TLX in both modes, indicating that the task was not perceived as hard or mentally demanding.

V. CONCLUSIONS AND FUTURE WORK
This study proposes a predictive haptic guidance methodology, combining intent detection, trajectory planning and adaptive assistance as a shared-control solution. This is an end-to-end mechanism, applicable to teleoperated pick and place tasks in industrial applications. We employed a deep reinforcement learning method for detecting an operator's intended goal and finding the shortest path towards baseline ProMP trajectories, learned from experts' demonstrations. Attraction toward dynamic objects are handled leveraging the Gaussian conditioning property of ProMPs. Force guidance is adaptively rendered on the haptic device, based on the robot's confidence of the detected intent.
The results of the user study shows that our system can provide operators with intuitive assistance. With force guidance, participants can grasp objects more precisely and optimize their trajectories, which is manifested in significantly lower number of grasping attempts, shorter trajectory length and smoother joint trajectories.
The proposed framework has the ability to detect users' intention in real time and generate a complete trajectory from current pose to the intended target based on the learned knowledge from experts' demonstration. The current study works with a limited number of trajectories, hence its ability to work in unknown scenes is not readily demonstrated. Please note that the paradigm can be extended to iteratively learn trajectories from experience to handle situations where current robot trajectories are far away from learned models. However, this is beyond the scope of the current study and will be examined as part of future work.
Due to COVID-19 and related workplace restrictions, the experimental study was conducted on a simulation framework, where manipulating and grasping with robot is easier than what one would have with a real robot arm. Through using the haptic interface, the participants were able to feel realistic sensations, and strong significant differences are observed during the experiment. As future work, we will implement the interaction mechanism on a physical robot.
The current paradigm does not consider strong conflicts between the plans of the user and the robot. Although the framework can handle slight changes to target configurations, larger diversions are not tested in this experiment. In addition, the paradigm does not model situations where human intentions are significantly different than the plans in the robot's repertoire. To tackle this, active learning mechanisms can be integrated in the current framework to enable iterative learning to enrich the robot's behaviour library. In addition, extending on our recent work [37], [38], we plan to integrate active conflict recognition within this framework, and as a result, handle not only collaborative but also conflicting scenarios in physical human-robot interaction.