The first approach in the OpenAI Retro Contest which I started to implement, test and modify was the JERK approach. Jerk agent is one of the baseline scripts for this contest.
You can find it here: https://github.com/openai/retro-baselines
I think it is the easiest algorithm to understand for programmers who doesn’t have any Machine Learning experience.
The pseudo-code for the JERK algorithm looks like this:
Why is this the easiest approach? Because this algorithm is based on rewards. But not the same kind of rewards like rainbow or ppo2. JERK algorithm has the moves already scripted before. It doesn’t learn the same way like the two others. Sonic runs forward and jumps and if it scores points or progresses on the level further it gets rewarded. It learns based on rewards and tries to not make the mistakes again, because making a mistake will cost him “reward points”. It’s somehow like with us, humans. We are motivated to do something if we get a possible reward at the end.