Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

    Kormushev P, Ugurlu B, Caldwell DG, Tsagarakis NGet al., 2019,

    Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid

    , AUTONOMOUS ROBOTS, Vol: 43, Pages: 79-95, ISSN: 0929-5593
    Saputra RP, Kormushev P, 2018,

    Casualty Detection from 3D Point Cloud Data for Autonomous Ground Mobile Rescue Robots

    © 2018 IEEE. One of the most important features of mobile rescue robots is the ability to autonomously detect casualties, i.e. human bodies, which are usually lying on the ground. This paper proposes a novel method for autonomously detecting casualties lying on the ground using obtained 3D point-cloud data from an on-board sensor, such as an RGB-D camera or a 3D LIDAR, on a mobile rescue robot. In this method, the obtained 3D point-cloud data is projected onto the detected ground plane, i.e. floor, within the point cloud. Then, this projected point cloud is converted into a grid-map that is used afterwards as an input for the algorithm to detect human body shapes. The proposed method is evaluated by performing detections of a human dummy, placed in different random positions and orientations, using an on-board RGB-D camera on a mobile rescue robot called ResQbot. To evaluate the robustness of the casualty detection method to different camera angles, the orientation of the camera is set to different angles. The experimental results show that using the point-cloud data from the on-board RGB-D camera, the proposed method successfully detects the casualty in all tested body positions and orientations relative to the on-board camera, as well as in all tested camera angles.

    Dutordoir V, Salimbeni H, Deisenroth M, Hensman Jet al., 2018,

    Gaussian Process Conditional Density Estimation

    Conditional Density Estimation (CDE) models deal with estimating conditionaldistributions. The conditions imposed on the distribution are the inputs of themodel. CDE is a challenging task as there is a fundamental trade-off betweenmodel complexity, representational capacity and overfitting. In this work, wepropose to extend the model's input with latent variables and use Gaussianprocesses (GP) to map this augmented input onto samples from the conditionaldistribution. Our Bayesian approach allows for the modeling of small datasets,but we also provide the machinery for it to be applied to big data usingstochastic variational inference. Our approach can be used to model densitieseven in sparse data regions, and allows for sharing learned structure betweenconditions. We illustrate the effectiveness and wide-reaching applicability ofour model on a variety of real-world problems, such as spatio-temporal densityestimation of taxi drop-offs, non-Gaussian noise modeling, and few-shotlearning on omniglot images.

    Wilson J, Hutter F, Deisenroth MP,

    Maximizing acquisition functions for Bayesian optimization

    , Advances in Neural Information Processing Systems (NIPS) 2018, Publisher: Massachusetts Institute of Technology Press, ISSN: 1049-5258

    Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) to guide its search process. Fully maximizing acquisition functions produces the Bayes' decision rule, but this ideal is difficult to achieve since these functions are frequently non-trivial to optimize. This statement is especially true when evaluating queries in parallel, where acquisition functions are routinely non-convex, high-dimensional, and intractable. We first show that acquisition functions estimated via Monte Carlo integration are consistently amenable to gradient-based optimization. Subsequently, we identify a common family of acquisition functions, including EI and UCB, whose characteristics not only facilitate but justify use of greedy approaches for their maximization.

    Wang K, Shah A, Kormushev P, 2018,

    SLIDER: A Bipedal Robot with Knee-less Legs and Vertical Hip Sliding Motion

    Creswell A, Bharath AA, 2018,

    Denoising Adversarial Autoencoders.

    , IEEE Trans Neural Netw Learn Syst

    Unsupervised learning is of growing interest because it unlocks the potential held in vast amounts of unlabeled data to learn useful representations for inference. Autoencoders, a form of generative model, may be trained by learning to reconstruct unlabeled input data from a latent representation space. More robust representations may be produced by an autoencoder if it learns to recover clean input samples from corrupted ones. Representations may be further improved by introducing regularization during training to shape the distribution of the encoded data in the latent space. We suggest denoising adversarial autoencoders (AAEs), which combine denoising and regularization, shaping the distribution of latent space using adversarial training. We introduce a novel analysis that shows how denoising may be incorporated into the training and sampling of AAEs. Experiments are performed to assess the contributions that denoising makes to the learning of representations for classification and sample synthesis. Our results suggest that autoencoders trained using a denoising criterion achieve higher classification performance and can synthesize samples that are more consistent with the input data than those trained without a corruption process.

    Sæmundsson S, Hofmann K, Deisenroth MP, 2018,

    Meta reinforcement learning with latent variable Gaussian processes

    , Uncertainty in Artificial Intelligence (UAI) 2018, Publisher: Association for Uncertainty in Artificial Intelligence (AUAI)

    Learning from small data sets is critical inmany practical applications where data col-lection is time consuming or expensive, e.g.,robotics, animal experiments or drug design.Meta learning is one way to increase the dataefficiency of learning algorithms by general-izing learned concepts from a set of trainingtasks to unseen, but related, tasks. Often, thisrelationship between tasks is hard coded or re-lies in some other way on human expertise.In this paper, we frame meta learning as a hi-erarchical latent variable model and infer therelationship between tasks automatically fromdata. We apply our framework in a model-based reinforcement learning setting and showthat our meta-learning model effectively gen-eralizes to novel tasks by identifying how newtasks relate to prior ones from minimal data.This results in up to a60%reduction in theaverage interaction time needed to solve taskscompared to strong baselines.

    Pardo F, Tavakoli A, Levdik V, Kormushev Pet al., 2018,

    Time limits in reinforcement learning

    , International Conference on Machine Learning, Pages: 4042-4051

    In reinforcement learning, it is common to let anagent interact for a fixed amount of time with itsenvironment before resetting it and repeating theprocess in a series of episodes. The task that theagent has to learn can either be to maximize itsperformance over (i) that fixed period, or (ii) anindefinite period where time limits are only usedduring training to diversify experience. In thispaper, we provide a formal account for how timelimits could effectively be handled in each of thetwo cases and explain why not doing so can causestate-aliasing and invalidation of experience re-play, leading to suboptimal policies and traininginstability. In case (i), we argue that the termi-nations due to time limits are in fact part of theenvironment, and thus a notion of the remainingtime should be included as part of the agent’s in-put to avoid violation of the Markov property. Incase (ii), the time limits are not part of the envi-ronment and are only used to facilitate learning.We argue that this insight should be incorporatedby bootstrapping from the value of the state atthe end of each partial episode. For both cases,we illustrate empirically the significance of ourconsiderations in improving the performance andstability of existing reinforcement learning algo-rithms, showing state-of-the-art results on severalcontrol tasks.

    Ceran ET, Gunduz D, Gyorgy A, 2018,

    Average age of information with hybrid ARQ under a resource constraint

    , Wireless Communications and Networking Conference (WCNC), Publisher: IEEE, ISSN: 1525-3511

    Scheduling the transmission of status updates over an error-prone communication channel is studied in order to minimize the long-term average age of information (AoI) at the destination under a constraint on the average number of transmissions at the source node. After each transmission, the source receives an instantaneous ACK/NACK feedback, and decides on the next update without prior knowledge on the success of future transmissions. First, the optimal scheduling policy is studied under different feedback mechanisms when the channel statistics are known; in particular, the standard automatic repeat request (ARQ) and hybrid ARQ (HARQ) protocols are considered. Then, for an unknown environment, an average-cost reinforcement learning (RL) algorithm is proposed that learns the system parameters and the transmission policy in real time. The effectiveness of the proposed methods are verified through numerical simulations.

    Kamthe S, Deisenroth MP, 2018,

    Data-efficient reinforcement learning with probabilistic model predictive control

    , Artificial Intelligence and Statistics, Publisher: PMLR, Pages: 1701-1710

    Trial-and-error based reinforcement learning(RL) has seen rapid advancements in recenttimes, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. Alarge number of interactions may be impractical in many real-world applications, such asrobotics, and many practical systems have toobey limitations in the form of state spaceor control constraints. To reduce the numberof system interactions while simultaneouslyhandling constraints, we propose a modelbased RL framework based on probabilisticModel Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs)to incorporate model uncertainty into longterm predictions, thereby, reducing the impact of model errors. We then use MPC tofind a control sequence that minimises theexpected long-term cost. We provide theoretical guarantees for first-order optimality inthe GP-based transition models with deterministic approximate inference for long-termplanning. We demonstrate that our approachdoes not only achieve state-of-the-art dataefficiency, but also is a principled way for RLin constrained environments.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=954&limit=10&respub-action=search.html Current Millis: 1553375139415 Current Time: Sat Mar 23 21:05:39 GMT 2019