Papers by José Santos-Victor

Research paper thumbnail of Gait planning for biped locomotion on slippery terrain
We propose a new biped locomotion planning method that optimizes locomotion speed subject to fric... more We propose a new biped locomotion planning method that optimizes locomotion speed subject to friction constraints. For this purpose we use approximate models of required coefficient of friction (RCOF) as a function of gait. The methodology is inspired by findings in human gait analysis, where subjects have been shown to adapt spatial and temporal variables of gait in order to reduce RCOF in slippery environments. Here we solve the friction problem similarly, by planning on gait parameter space: namely foot step placement, step swing time, double support time and height of the center of mass (COM). We first used simulations of a 48 degreesof-freedom robot to estimate a model of how RCOF varies with these gait parameters. Then we developed a locomotion planning algorithm that minimizes the time the robot takes to reach a goal while keeping acceptable RCOF levels. Our physics simulation results show that RCOF-aware planning can drastically reduce slippage amount while still maximizing efficiency in terms of locomotion speed. Also, according to our experiments human-like stretched-knees walking can reduce slippage amount more than bent-knees (i.e. crouch) walking for the same speed.

Mechanisms and machine science, Nov 13, 2015

We propose a novel approach for detecting events in data sequences, based on a predictive method ... more We propose a novel approach for detecting events in data sequences, based on a predictive method using Gaussian processes. We have applied this approach for detecting relevant events in the therapeutic exercise sequences, wherein obtained results in addition to a suitable classifier, can be used directly for gesture segmentation. During exercise performing, motion data in the sense of 3D position of characteristic skeleton joints for each frame are acquired using a RGBD camera. Trajectories of joints relevant for the upper-body therapeutic exercises of Parkinson's patients are modelled as Gaussian processes. Our event detection procedure using an adaptive Gaussian process predictor has been shown to outperform a first derivative based approach.

Egomotion estimation using log-polar images

Abstract We address the problem of egomotion estamataon of a monocular observer moving with arbat... more Abstract We address the problem of egomotion estamataon of a monocular observer moving with arbatrary transla-tion and rotation in an unknown envaronment, using log-polar images. The method we propose is uniquely based on the spatio-temporal image ...

Research paper thumbnail of Design of a Robotic Coach for Motor, Social and Cognitive Skills Training Toward Applications With ASD Children

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2021

Socially assistive robots may help the treatment of autism spectrum disorder(ASD), through games ... more Socially assistive robots may help the treatment of autism spectrum disorder(ASD), through games using dyadic interactions to train social skills. Existing systems are mainly based on simplified protocols which qualitatively evaluate subject performance. We propose a robotic coaching platform for training social, motor and cognitive capabilities, with two main contributions: (i) using triadic interactions(adult, robot and child), with robotic mirroring, and (ii) providing quantitative performance indicators. The key system features were accurately designed, including type of protocols, feedback systems and evaluation metrics, contemplating the requirements for applications with ASD children. We implemented two protocols, Robot-Master and Adult-Master, where children performed different gestures guided by the robot or the adult respectively, eventually receiving feedback about movement execution. In both, the robot mirrors the subject during the movement. To assess system functionalities, with a homogeneous group of subjects, tests were carried out with 28 healthy subjects; one preliminary acquisition was done with an ASD child. Data analysis was customized Manuscript

In this paper we present an approach to the gaze stabilization problem using Adaptive Frequency O... more In this paper we present an approach to the gaze stabilization problem using Adaptive Frequency Oscillators to learn the frequency, phase and amplitude of the optical flow and generate compensatory commands during robot locomotion. Assuming periodic and nearly sine shaped motion of the robot, the system successfully stabilizes the gaze of the robot, whether the robot itself is moving, or an external object is moving relative to the robot. We present experiments in simulation and with a real robotics setup, the Hoap 3, showing that the system can be successfully applied to gaze stabilization during locomotion, even when the feedback loop is very slow and noisy.

IFAC Proceedings Volumes, Jul 1, 2004

An important feature for autonomous underwater vehicles equipped with video cameras in survey mis... more An important feature for autonomous underwater vehicles equipped with video cameras in survey missions, is the ability to quickly generate a wide area view of the sea floor. This paper presents a method for the fast creation of globally consistent video mosaics. A closed-form solution for the estimation of the global image motion is presented. It uses a least-squares criteria over a residual vector which is linear on the homography parameters. Aiming at real-time operation, a fast implementation is described using recursive least-squares, which permits the creation of globally consistent mosaics during video acquisition. The application to underwater imagery is illustrated by the creation of video mosaics capable of being used for surveying or autonomous navigation.

We propose an approach for a robot to imitate the gestures of a human demonstrator. Our framework... more We propose an approach for a robot to imitate the gestures of a human demonstrator. Our framework consists solely of two components: a Sensory-Motor Map (SMM) and a View-Point Transformation (VPT). The SMM establishes an association between an arm image and the corresponding joint angles and it is learned by the system during a period of observation of its own gestures. The VPT is widely discussed in the psychology of visual perception and is used to transform the image of the demonstrator's arm to the so-called ego-centric image, as if the robot were observing its own arm. Different structures of the SMM and VPT are proposed in accordance with observations in human imitation. The whole system relies on monocular visual information and leads to a parsimonious architecture for learning by imitation. Real-time results are presented and discussed.

Research paper thumbnail of Anticipation in Human-Robot Cooperation: A Recurrent Neural Network Approach for Multiple Action Sequences Prediction
Close human-robot cooperation is a key enabler for new developments in advanced manufacturing and... more Close human-robot cooperation is a key enabler for new developments in advanced manufacturing and assistive applications. Close cooperation require robots that can predict human actions and intent, understanding human non-verbal cues. Recent approaches based on neural networks have led to encouraging results in the human action prediction problem both in continuous and discrete spaces. Our approach extends the research in this direction. Our contributions are three-fold. First, we validate the use of gaze and body pose cues as a means of predicting human action through a feature selection method. Next, we address two shortcomings of existing literature: predicting multiple and variable-length action sequences. This is achieved by applying an encoder-decoder recurrent neural network topology in the discrete action prediction problem. In addition, we theoretically demonstrate the importance of predicting multiple action sequences as a means of estimating the stochastic reward in a human robot cooperation scenario. Finally, we show the ability to effectively train the prediction model on an action prediction dataset, involving human motion data, and explore the influence of the model's parameters on its performance.

Benchmarking shape completion methods for robotic grasping

Research paper thumbnail of Biologically Inspired Controller of Human Action Behaviour for a Humanoid Robot in a Dyadic Scenario
Humans have a particular way of moving their body when interacting with the environment and with ... more Humans have a particular way of moving their body when interacting with the environment and with other humans. The movement of the body is commonly known and expresses the intention of the action. The express of intent by our movement is classified as non-verbal cues, and from them, it is possible to understand and anticipate the actions of humans. In robotics, humans need to understand the intention of the robot in order to efficiently and safely interact in a dyadic activity. If robots could possess the same non-verbal cues when executing the same actions, then humans would be capable of interacting with robots the way they interact with other humans. We propose a robotic controller capable of executing actions of moving objects on a table (placing) and handover objects to humans (giving) in a human-like behaviour. Our first contribution is to model the behaviour of the non-verbal cues of a human interacting with other humans while performing placing and giving actions. From the recordings of the motion of the human, we build a computational model of the trajectory of the head, torso, and arm for the different actions. Additionally, the human motion model was consolidated with the integration of a previously developed human gaze behaviour model. As a second contribution, we embedded this model in the controller of an iCub humanoid robot and compared the generated trajectories to the real human model, and additionally, compare with the existing minimum-jerk controller for the iCub (iKin). Our results show that it is possible to model the complete upper body human behaviour during placing and giving interactions, and the generated trajectories from the model give a better approximation of the human-like behaviour in a humanoid robot than the existing inverse kinematics solver. From this work, we can conclude that our controller is capable of achieving a humanlike behaviour for the robot which is a step towards robots capable of understanding and being understood by humans.

Research paper thumbnail of Learning Deep Features for Robotic Inference from Physical Interactions

IEEE Transactions on Cognitive and Developmental Systems, 2022

In order to effectively handle multiple tasks that are not pre-defined, a robotic agent needs to ... more In order to effectively handle multiple tasks that are not pre-defined, a robotic agent needs to automatically map its high-dimensional sensory inputs into useful features. As a solution, feature learning has empirically shown substantial improvements in obtaining representations that are generalizable to different tasks, compared to feature engineering approaches, but it requires a large amount of data and computational capacity. These challenges are specifically relevant in robotics due to the low signal-to-noise ratios inherent to robotic data, and to the cost typically associated with collecting this type of input. In this paper, we propose a deep probabilistic method based on Convolutional Variational Auto-Encoders (CVAEs) to learn visual features suitable for interaction and recognition tasks. We run our experiments on a self-supervised robotic sensorimotor dataset. Our data was acquired with the iCub humanoid and is based on a standard object collection, thus being readily extensible. We evaluated the learned features in terms of usability for 1) object recognition, 2) capturing the statistics of the effects, and 3) planning. In addition, where applicable, we compared the performance of the proposed architecture with other state-ofthe-art models. These experiments demonstrate that our model is capable of capturing the functional statistics of action and perception (i.e. images) which performs better than existing baselines, without requiring millions of samples or any handengineered features.

Mechanisms and machine science, Sep 27, 2018

In this paper, we investigate how the objective movement assessment can support the clinical prac... more In this paper, we investigate how the objective movement assessment can support the clinical practice in the stroke treatment. The movement data are collected using vision-based, low-cost and marker-free Kinect sensor device. Sensor recordings are collected in the hospital settings for stroke outpatients with the supervision of medical doctors. We propose movement performance indicators, extracted from the sensor signals, to characterize the movements. The proposed approach for movement quantification is intended to support the clinical evaluations and to monitor the patients' state over time. The emphasis is on the verification of the proposed indicators and investigation of their importance for the stroke relevant clinical aspects.

Research paper thumbnail of Learning Conditional Postural Synergies for Dexterous Hands: A Generative Approach Based on Variational Auto-Encoders and Conditioned on Object Size and Category

Learning Conditional Postural Synergies for Dexterous Hands: A Generative Approach Based on Variational Auto-Encoders and Conditioned on Object Size and Category

Postural synergies are used in robotics to facilitate the control of dexterous artificial hands. ... more Postural synergies are used in robotics to facilitate the control of dexterous artificial hands. This is achieved by learning a latent space (synergy space) from grasp postures and directly controlling the hand in this space. In this work, we propose the use of a non-linear conditional model for learning the latent space, that can incorporate the object shape and size as additional variables. While on most of the previous works the evaluation criterion is the reconstruction error, we propose to use the smoothness of the latent space. Our model ranks better than other non-linear models in smoothness, which is a better criterion to evaluate in-hand manipulation tasks. We validate our arguments by executing regrasp trajectories in which our model outperforms all previous approaches.

Research paper thumbnail of Vision-based Navigation, Environmental Representations and Imaging Geometries

Springer eBooks, Aug 11, 2007

We discuss the role of spatial representations and visual geometries in vision-based navigation. ... more We discuss the role of spatial representations and visual geometries in vision-based navigation. To a large extent, these choices determine the complexity and robustness of a given navigation strategy. For instance, navigation systems relying on a geometric representation of the environment, use most of the available computational resources for localization rather than for "progressing" towards the final destination. In most cases, however, the localization requirements can be alleviated and different (e.g. topological) representations used. In addition, these representations should be adapted to the robot's perceptual capabilities. Another aspect that strongly influences the success/complexity of a navigation system is the geometry of the visual system itself. Biological vision systems display alternative ocular geometries that proved successful in different (and yet demanding and challenging) navigation tasks. The compound eyes of insects or the human foveated retina are clear examples. Similarly, the choice of the particular geometry of the vision system and image sampling scheme, are important design options when building a navigation system. We provide a number of examples in vision based navigation, where special spatial representations and visual geometries have been taken in consideration, resulting in added simplicity and robustness of the resulting system.

Research paper thumbnail of Optimizing energy consumption and preventing slips at the footstep planning level
Energy consumption and stability are two important problems for humanoid robots deployed in remot... more Energy consumption and stability are two important problems for humanoid robots deployed in remote outdoor locations. In this paper we propose an extended footstep planning method to optimize energy consumption while considering motion feasibility and ground friction constraints. To do this we estimate models of energy, feasibility and slippage in physics simulation, and integrate them into a hybrid A* search and optimization-based planner. The graph search is done in footstep position space, while timing (leg swing and double support times) and COM motion (parameterized height trajectory) are obtained by solving an optimization problem at each node. We conducted experiments to validate the obtained energy model on the real robot, as well as planning experiments showing 9 to 19% energy savings. In example scenarios, the robot can correctly plan to optimally traverse slippery patches or avoid them depending on their size and friction; and uses stairs with the most beneficial dimensions in terms of energy consumption.

Research paper thumbnail of Open and closed-loop task space trajectory control of redundant robots using learned models
This paper presents a comparison of open-loop and closed-loop control strategies for tracking a t... more This paper presents a comparison of open-loop and closed-loop control strategies for tracking a task space trajectory, using redundant robots. We do not assume any knowledge of the analytical forward and inverse kinematics, relying instead on learning these models online, while executing a desired task. Specifically, we employ a recent learning algorithm that allows to learn a probabilistic model from which both the forward and inverse solutions can be obtained, as well as the Jacobian of the kinematics map. Such learned model can then be used to implement both types of control. Moreover, the multi-valued solutions provided by the learned model can be applied to redundant systems in which an infinite number of inverse solutions may exist. We present experiments with a simulated version of the iCub, a highly redundant humanoid robot, in which this learned model is employed to execute both open-loop and closed-loop trajectory control. We show the advantages and drawbacks of both control strategies, and we propose a way to combine them to deal with sensor noise and failures, showing the benefits of using a learning algorithm that can simultaneously provide forward and inverse predictions.

Research paper thumbnail of From human action understanding to robot action execution: how the physical properties of handled objects modulate non-verbal cues
Humans manage to communicate action intentions in a non-verbal way, through body posture and move... more Humans manage to communicate action intentions in a non-verbal way, through body posture and movement. We start from this observation to investigate how a robot can decode a human's non-verbal cues during the manipulation of an object, with specific physical properties, to learn the adequate level of "carefulness" to use when handling that object. We construct dynamical models of the human behaviour using a human-to-human handover dataset consisting of 3 different cups with different levels of fillings. We then included these models into the design of an online classifier that identifies the type of action, based on the human wrist movement. We close the loop from action understanding to robot action execution with an adaptive and robust controller based on the learned classifier, and evaluate the entire pipeline on a collaborative task with a 7-DOF manipulator. Our results show that it is possible to correctly understand the "carefulness" behaviour of humans during object manipulation, even in the pick and place scenario, that was not part of the training set. Humans are capable of expressing their actions and intentions, resorting to verbal and/or non-verbal communication. In verbal communication, humans use language to express, in structured linguistic terms, the desired action they wish to perform. Non-verbal communication refers to the expressiveness of the human body movements during the interaction with other humans, while manipulating objects, or simply navigating in the world. In a sense, all actions that require moving our musculoskeletal system contribute to expressing the intention concerning the completion of that action. Moreover, considering that all humans share a common motor-repertoire, i.e. the degrees of freedom and joint limits, excluding cultural or society-based influences, all humans express action intentions using a common nonverbal language. From walking along a corridor, to pointing to a painting on a wall, or handing over a cup to someone, communication is provided in the form of non-verbal "cues", that express action intentions . Endowing robots with the ability to understand human action intentions from non-verbal cues will broaden the 1 N.Ferreira Duarte, K.Chatzilygeroudis, and A.

Research paper thumbnail of Design and validation of two embodied mirroring setups for interactive games with autistic children using the NAO humanoid robot
Socially assistive robots have shown potential benefits in therapy of child and elderly patients ... more Socially assistive robots have shown potential benefits in therapy of child and elderly patients with social and cognitive deficits. In particular, for autistic children, humanoid robots could enhance engagement and attention, thanks to their simplified toy-like appearance and the reduced set of possible movements and expressions. The recent focus on autism-related motor impairments has increased the interest on developing new robotic tools aimed at improving not only the social capabilities but also the motor skills of autistic children. To this purpose, we have designed two embodied mirroring setups using the NAO humanoid robot. Two different tracking systems were used and compared: Inertial Measurement Units and the Microsoft Kinect, a marker-less vision based system. Both platforms were able to mirror upper limb basic movements of two healthy subjects, an adult and a child. However, despite the lower accuracy, the Kinect-based setup was chosen as the best candidate for embodied mirroring in autism treatment, thanks to the lower intrusiveness and reduced setup time. A prototype of an interactive mirroring game was developed and successfully tested with the Kinect-based platform, paving the way to the development of a versatile and powerful tool for clinical use with autistic children.

Research paper thumbnail of On the advantages of foveal mechanisms for active stereo systems in visual search tasks

Autonomous Robots, Feb 4, 2017

In this work we study how information provided by foveated images sampled according to the log-po... more In this work we study how information provided by foveated images sampled according to the log-polar transformation can be integrated over time in order to build accurate world representations and accomplish visual search tasks in an efficient manner. We focus on a specific visual information modalitydepth -and on how to store it in a flexible memory structure. We propose a probabilistic observational model for a stereo system that relies on the Unscented Transform in order to propagate uncertainty in stereo matching, due to spatial quantization in the retina, to the 3D Cartesian domain. Probabilistic depth measurements are integrated in a novel Sensory Ego-Sphere whose topology can be biased with foveal-like distributions, according to the autonomous agent short-term tasks and goals. Furthermore, we investigate an Upper Confidence Bound (UCB) algorithm for the task of simultaneously finding the closest object to the observer (visual search) and learning the surrounding environment 3D map (mapping). The performance of task execution is assessed both with a foveated log-polar sensor and a classical uniform one. The advantage of foveal vision and custom egosphere representations are illustrated in a series of experiments with a realistic simulator.