On Learning to Guide Task and Motion Planning (TAMP)

03/15/2019

Task and Motion planning involves reasoning from a high-level objective, like cleaning a kitchen, down to geometry, physics and kinematics to change the state of the robot and the environment. In order to do this, we have to define an interface between a low-level controller and a high-level planner. We call this interface, an Operator.
An Operator is a computer program that consists of three things; An input parameter, A model and Robot motion. It takes in an input parameter, applies its model on the input parameter and returns a Robot motion.

An example is a Pick Operator.

Its input consists of;

  1. O: object to be picked
  2. P: robot base pose
  3. G: grasp

Its model is ;

IF
PathExists(P)
IKExists(G,O)
THEN
return Picked(O)

Its robot motion is the returned Pick motion.

Planning Algorithms for TAMP

Planning algorithms for TAMP take in two parameters; Initial State S0, and Goal State, Sg and returns a sequence of actions a1,…,aT where each of these actions is an operator instance. Associated with each action is a sequence of robot configurations c1, …, cH for performing the task.

Inside the planner, there are 3 things; a search module, a sampler and a feasibility checker. Search module is basically a graph search algorithm such as Dijkstra, A*, BFS, etc. The sampler deals with continuous variables, such as the base pose. At search time, the sampler chooses the next node to expand according to the search algorithm and the sampler will sample the promising values for these continuous variables. The sampler then calls the feasibility checker, eg. motion planner, which checks that the input parameters sampled are feasible. If they are, then we can simulate forward from the node we are in to the next node and then the search control goes over to the search module which then selects the next node to expand. This process repeats itself until we arrive at a particular goal node.

There have been many efforts in the past few years on how to come up with a better search algorithm by designing better heuristics or how to come up with an intelligent sampler to generate good enough samples that will always get to the goal state.

  • Kaelbling, ICRA 2011, IJRR 2013
  • Srivastava,ICRA 2014
  • Garrett, WAFR 2014, IJRR 2017
  • Toussaint, IJCAI 2015
  • Dantam, RSS 2016, IJRR 2018

The problem with these approaches is that they are typically slow and do not improve with experience. The primary reason why they are slow is that they have a complex hybrid search space. Let’s say a robot is trying to cook a meal for dinner. The robot has to decide on discrete decision variables such as which object to manipulate in order to prepare the meal. And as you can imagine, as the number of objects to manipulate and the planning horizons grow, it will run into a combinatorial explosion. We also have to decide on continuous variables such as where should the robot stand in order to pick a particular object. This has to satisfy certain physical and geometric constraints and these are usually reasoned by calling an external low-level motion planner which usually is expensive.

In order to avoid this computational problem, we might have to use learning. So we can train a neural network that can try to map a state to a low-level robot control which would maximize the sum of the rewards along a particular time horizon. The premise here is that there exists some kind of smoothness that the function approximator can exploit in terms of patterns with respect to the states and action spaces. The problem for TAMP problems is that, a small change in state results in a large change in the feasible motion. This also happens with respect to the changes in actions. Based on this observation, we can foresee that learning would require a lot of data because you need to learn all the fine details. On the other hand, if we manage to train a neural network with as much data as it needs to learn these fine details, almost no online computation would be required. Planning on the other hand, though computationally expensive, needs no training data.

We can think of a spectrum between complete learning and complete planning; with complete planning being on the far left end of the spectrum and complete learning being on the far right end of the spectrum. Complete planning is difficult because we have a complex search space and complete learning is difficult because we’d need an enormous amount of data. So an obvious optimal approach would be to try and sit in the middle of the spectrum and learn to guide planning by predicting constraints that guide the planning procedure. Constraints are a set of suggestions to a subset of decision variables. So the total number of decision variables for an object-placement problem could be all the configurations for placing the object down. But if we restrict ourselves to predicting the robot placement base pose, then that becomes much more smooth with respect to the changes in the state. And for the rest of the decision variables that have to exhibit collision-free behavior, we have to delegate it to a low-level motion planner to adapt to the changes in the environment.

The overall strategy would be to leave out all the variables that are difficult to learn and have the planner fill in the rest. The learner learns the variables it can easily learn and leave the difficult ones to the planner to figure them out.

Typically, a planning algorithm maps a problem instance to a solution. But this is very slow. So we learn to predict constraints and feed them into the planner. This reduces the search space of the planner and speeds up planning time.

Overview of Approach

  1. Take in past planning experience as the training dataset and design a learning algorithm that would train a learned predictor.
  2. The learned predictor takes in a problem instance that is represented with a particular vector representation and then outputs constraints for the particular problem instance.
  3. Then the planner takes in the constraints and outputs a plan that satisfies this constraint.

This poses two challenges;

  • How to represent a complex planning problem
  • How to design a sample efficient learning algorithm that can learn from the past planning experience dataset. This is a bit different from traditional supervised learning or reinforcement learning dataset.

Why is representation difficult?

Consider a scenario where the object of interest lies on a table alongside other objects of similar geometry. And in the next scenario, the same object  is placed vertically in a shelf. How do we represent the states of this object in these two different scenarios? Should we encode all of the object’s shapes and poses into an input vector? How do we know which pose go into which dimension? It turns out the right representation for such problems is a set of varying cardinality instead of a vector and current machine learning algorithms can deal with this kind of representation. Another complicating factor is the logical and geometric object state.  For example, with the task of cooking a meal, we have a symbolic object state that encodes whether the object has been cooked or not and we have geometric object states that encode the pose of the object (x,y,z, r, p, y) and all these problems make it difficult to design a fixed-size object representation and this is one of the challenges in TAMP. Another challenge in TAMP is the uncertainty in the end results of plan trajectories in the search tree that do not link to the goal state. They could either lead to the goal state if further explored, lead to a dead end or cycle back into the tree (neutral trajectories). In practice, there are vastly more neutral trajectories than goal trajectories. Hence, the challenge of how to leverage neutral trajectories in learning to predict constraints is an important one.

In work by Kim, Kaelbling and Lozano-Perez on Learning to reason for geometric tasks, they attempt to solve this problem in a geometric task and motion planning problem. Geometric reasoning is reasoning about reachability among obstacles. Recent advances in planning have introduced ways to plan paths for high degrees of freedom robots while avoiding obstacles. Some have even managed to balance humanoid robots and get them to move to goal states while avoiding obstacles. Based on these advancements, the geometric task and motion planning can be defined as follows;

Given an arbitrary number of objects with varying shapes and poses, change the object poses to achieve a high-level objective.

An example scenario: The robot enters a room with cluttered boxes of different shapes and sizes. It’s aim is to navigate to one corner of the room, pick a specific box and move the other boxes out of its way as it transports the box of interest out of the room. The question is, can you come up with a long sequence of collision-free motion that can achieve these high-level objectives.

To address the neutral trajectories problem, they make following assumptions;

  • Same number of objects
  • Same object sizes

Making these assumptions free them from the representation problem since they can now represent the problem instance with a vector. The search process is made up of Random sampling and graph search. In that, at every node, sample k number of actions and search with the sampled actions. One observation we can make is that if we can sample promising actions that are likely to be on a goal trajectory, planning would be more efficient. Taking it to the extreme, if you can come up with a sampler that always leads to the goal state, then no search would be needed at all.

Our objective is to learn a stochastic policy that generates promising actions using past search experience.

Problem formulation

Given a set of search trees gotten by solving n number of problem instances w; that are solved using a uniform random policy queue that conditions on the problem instance w and a particluar state, learn a policy that assigns high probability to the goal trajectory which we are going to denote with P. One approach we could take is to train a GAN using successful trajectories. We would take all the state-action pairs that resulted in a goal state and show it to a GAN and train a generator to generate actions that are similar to those on the solution trajectory

Leave a Reply

Your email address will not be published. Required fields are marked *