Policy search methods enable a robot to perform a wide range of tasks. But in practice, applying policy search methods require the separate components, viz; perception, state-estimation and low-level control, are trained independently of each other. The work in this paper pursues this question; “Does end-to-end training of the perception and control components jointly provide better performance than training the components individually?”
[TO BE CONTINUED]