Observation: 112x112 RGBArray
Action: Two Wheel Motor Voltages [-1,1]
Reward: Sum(Voltages) (-10 if still)
Start training with experiences from old model, so only need 1000 iterations to learned run straight.
Experience stage and training stage is separated, collect data for a while and then train for several days (on an old GPU).
At last it learn to avoid the wall, but fail to run smoothly...