train donkey car with reinforce learning

Observation: 112x112 RGBArray
Action: Two Wheel Motor Voltages [-1,1]
Reward: Sum(Voltages) (-10 if still)
Algorithm: D4PG

Start training with experiences from old model, so only need 1000 iterations to learned run straight.

Experience stage and training stage is separated, collect data for a while and then train for several days (on an old GPU).

At last it learn to avoid the wall, but fail to run smoothly...

Demo Video