Policy search is a field in Reinforcement Learning that involves finding good parameters for a policy parametrization.
Policy search is well-suited for robotics because a learnt policy enables control in high-dimensional state and action spaces.
Model-free policy search methods learn a policy from sampled trajectories whilst model-based methods learn a simulator and sample trajectories from the simulator to learn the policy. Model-based methods are known to promote generalization since a model of the world is learnt and not just a Q-function that maps states and actions to rewards. Model-based methods are however orders of magnitude more sample-inefficient than Model-free methods.
In this primer, I’ll only focus on two model-based policy search methods viz; Backpropagation Through Time (BBPT) and Stochastic Value Gradient (SVG)