OpenAI Retro Contest – Everything I know about JERK agent

The first approach in the OpenAI Retro Contest which I started to implement, test and modify was the JERK approach. Jerk agent is one of the baseline scripts for this contest.

You can find it here: https://github.com/openai/retro-baselines

I think it is the easiest algorithm to understand for programmers who doesn’t have any Machine Learning experience.

The pseudo-code for the JERK algorithm looks like this:

Why is this the easiest approach? Because this algorithm is based on rewards. But not the same kind of rewards like rainbow or ppo2. JERK algorithm has the moves already scripted before. It doesn’t learn the same way like the two others. Sonic runs forward and jumps and if it scores points or progresses on the level further it gets rewarded. It learns based on rewards and tries to not make the mistakes again, because making a mistake will cost him “reward points”. It’s somehow like with us, humans. We are motivated to do something if we get a possible reward at the end.

The name is just an acronym for “Just Enough Retained Knowledge”.

What’s the problem with jerk? It’s neither good or bad.

In one environment it could score higher than Rainbow or PPO2, but in other it would totally fail. So on some Sonic levels this approach would be very good but on other would totally fail.

And if we look at the statistics of running this algorithm on test levels, we get exactly this. This approach will score higher on some levels but on other would totally fail.

But we can modify this script. And in my experience modifying this script will let us score higher numbers than the standard baselines algorithm.

That’s all for the Theory. Lets start with the practice.

Environment and Scenario File

Before we start there are some things which you need to know.

The baseline algorithm runs on ‘tmp/sock’ environment. To run this on local environment we need to change it:

We added a scenario.json file to our environment. But the problem is, we don’t know what exactly kind of scenario file is run on the OpenAI test server. Based on the Gotta Learn Fast Report:

https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/retro-contest/gotta_learn_fast_report.pdf

We can read in the section 3.8 Rewards that it is using two components: horizontal offset (x offset) and completion bonus (level_end_bonus). So based on the information we can only guess the rest.

So this scenario.json file works only on our local environment and is made only for test purposes.

You can find it in the retro-contest folder: ..\retro-contest\gym-retro\data\SonicTheHedgehog-Genesis

And it looks like this:

{
  "done": {
    "variables": {
      "lives": {
        "op": "zero"
      }
    }
  },
  "reward": {
    "variables": {
      "score": {
        "reward": 10.0
      }
    }
  }
}

Based on this guide: https://github.com/openai/retro#scenario-information-scenariojson

I would add the parameters: “score”, “x”, “screen_x”, and “level_end_bonus” to our scenario.json file with adequate rewards.

What kind of rewards can we apply here? We can find all the variable names in this file:

https://github.com/openai/retro/blob/master/data/SonicTheHedgehog-Genesis/data.json

To get the scenario file running  we need to copy it to our script folder.

As I said before we can’t upload the scenario file on the docker server, but adding some extra variables will help us in improving our jerk agent.

After changing the environment don’t forget to add “from retro import make” to the python script.

And to get our game running we need to write env.render() in one of the code lines.

This is the modified jerk script to get running on local evaluation:

#!/usr/bin/env python

"""
A scripted agent called "Just Enough Retained Knowledge".
"""

import random

import gym
import numpy as np

import gym_remote.client as grc
import gym_remote.exceptions as gre

from retro import make

EXPLOIT_BIAS = 0.25
TOTAL_TIMESTEPS = int(1e6)

def main():
    """Run JERK on the attached environment."""
    env = make(game='SonicTheHedgehog-Genesis', state='GreenHillZone.Act1', scenario='scenario.json')
    env = TrackedEnv(env)
    new_ep = True
    solutions = []
    while True:
        if new_ep:
            if (solutions and
                    random.random() < EXPLOIT_BIAS + env.total_steps_ever / TOTAL_TIMESTEPS):
                solutions = sorted(solutions, key=lambda x: np.mean(x[0]))
                best_pair = solutions[-1]
                new_rew = exploit(env, best_pair[1])
                best_pair[0].append(new_rew)
                print('replayed best with reward %f' % new_rew)
                continue
            else:
                env.reset()
                new_ep = False
        rew, new_ep = move(env, 100)
        if not new_ep and rew <= 0:
            print('backtracking due to negative reward: %f' % rew)
            _, new_ep = move(env, 70, left=True)
        if new_ep:
            solutions.append(([max(env.reward_history)], env.best_sequence()))

def move(env, num_steps, left=False, jump_prob=1.0 / 10.0, jump_repeat=4):
    """
    Move right or left for a certain number of steps,
    jumping periodically.
    """
    total_rew = 0.0
    done = False
    steps_taken = 0
    jumping_steps_left = 0
    while not done and steps_taken < num_steps:
        action = np.zeros((12,), dtype=np.bool)
        action[6] = left
        action[7] = not left
        if jumping_steps_left > 0:
            action[0] = True
            jumping_steps_left -= 1
        else:
            env.render()
            if random.random() < jump_prob:
                jumping_steps_left = jump_repeat - 1
                action[0] = True
        _, rew, done, _ = env.step(action)
        total_rew += rew
        steps_taken += 1
        if done:
            break
    return total_rew, done

def exploit(env, sequence):
    """
    Replay an action sequence; pad with NOPs if needed.

    Returns the final cumulative reward.
    """
    env.reset()
    done = False
    idx = 0
    while not done:
        if idx >= len(sequence):
            _, _, done, _ = env.step(np.zeros((12,), dtype='bool'))
        else:
            _, _, done, _ = env.step(sequence[idx])
        idx += 1
    return env.total_reward

class TrackedEnv(gym.Wrapper):
    """
    An environment that tracks the current trajectory and
    the total number of timesteps ever taken.
    """
    def __init__(self, env):
        super(TrackedEnv, self).__init__(env)
        self.action_history = []
        self.reward_history = []
        self.total_reward = 0
        self.total_steps_ever = 0

    def best_sequence(self):
        """
        Get the prefix of the trajectory with the best
        cumulative reward.
        """
        max_cumulative = max(self.reward_history)
        for i, rew in enumerate(self.reward_history):
            if rew == max_cumulative:
                return self.action_history[:i+1]
        raise RuntimeError('unreachable')

    # pylint: disable=E0202
    def reset(self, **kwargs):
        self.action_history = []
        self.reward_history = []
        self.total_reward = 0
        return self.env.reset(**kwargs)

    def step(self, action):
        self.total_steps_ever += 1
        self.action_history.append(action.copy())
        obs, rew, done, info = self.env.step(action)
        self.total_reward += rew
        self.reward_history.append(self.total_reward)
        return obs, rew, done, info

if __name__ == '__main__':
    try:
        main()
    except gre.GymRemoteError as exc:
        print('exception', exc)

I can call it in my bash running: python jerk_agent.py and can see the result of the code on the screen.

Scripting

What parameters of the code are most important? Where can we focus our attention to get better results?

In my opinion it’s hard to say. Every parameter has impact on the other. In understanding the jerk agent I started with the main() and move() function.

At the beginning I modified this part of the code:

rew, new_ep = move(env, 100)
if not new_ep and rew <= 0:
    print('backtracking due to negative reward: %f' % rew)
    _, new_ep = move(env, 70, left=True)
if new_ep:

The first parameter is responsible for running forward and the second for moving backward. So our Sonic runs 100 steps forward, but if he approaches an obstacle which he can’t pass he will go 70 steps backward. There are only two variables but they have a very large impact on the final score.

The other important variable is:

EXPLOIT_BIAS = 0.25

This variable has an impact on the probability of our agent to try exploiting successful run.

In our main function we get this statement:

if new_ep:
    if (solutions and
            random.random() < EXPLOIT_BIAS + env.total_steps_ever / TOTAL_TIMESTEPS):
        solutions = sorted(solutions, key=lambda x: np.mean(x[0]))
        best_pair = solutions[-1]
        new_rew = exploit(env, best_pair[1])
        best_pair[0].append(new_rew)
        print('replayed best with reward %f' % new_rew)
        continue
    else:
        env.reset()
        new_ep = False
rew, new_ep = move(env, 45)

Which calls this function:

def exploit(env, sequence):
    """
    Replay an action sequence; pad with NOPs if needed.

    Returns the final cumulative reward.
    """
    env.reset()
    done = False
    idx = 0
    while not done:
        if idx >= len(sequence):
            _, _, done, _ = env.step(np.zeros((12,), dtype='bool'))
        else:
            _, _, done, _ = env.step(sequence[idx])
        idx += 1
    return env.total_reward

This variable is in charge to decide if the last succesful solution should be used again, or should we maybe try a new approach.

In the move() function we get the jump probability. Based on the variable we can decide how often should sonic jump.

If we comment out our action[] array and change the jump_prob to 1.0 , our agent would stop jumping or moving.

Changing it to 2.0 will make sonic jump very often and to 80.0 will make the sonic jump sometimes.

The variable set to 80.0

The other variable jump_repeat is in charge to decide how often is the jump button pressed. Changing it to 10 will make sonic jump higher and changing it to 1 will make sonic jump lower.

What kind of buttons are pressed we can see it in this line of code:

action = np.zeros((12,), dtype=np.bool)
action[6] = left
action[7] = not left

action array is an array of 12 elements with false variable at the beginning. The elements are: “B, A, MODE, START, UP, DOWN, LEFT, RIGHT, C, Y, X, Z. ”

So if we trigger

action[0] = True

We make sonic jump, because we trigger the “B” button.

Looking at the controls: https://strategywiki.org/wiki/Sonic_the_Hedgehog/Controls

Pressing A, B or C makes no difference, because all those buttons trigger the same action.

Same with action[6] and action[7]. The first is responsible for running Left and the second for running right.

Rewards

To see how the reward variable changes I added the two lines of code to my script:

print("rew" , rew)
print("total_rew",  total_rew)

This way i know how much of reward my agent becomes and when is the reward variable reseted.

My opinion on JERK Approach.

After testing and editing the script for 2 weeks I think it is not a bad approach. But as i said before. Jerk is only as good as the level where sonic has to run. I am not 100% sure but I can guess that there are obstacles where JERK agent has a lot of problems and where at the same time Machine Learning algorithms have none.

After all it is a very good beginner algorithm. So if you are starting with AI I would give it a try. I think it is a very good baseline on which you can build your AI knowledge.

The JERK algorithm snippet is from the Gotta Learn Fast Paper:

https://arxiv.org/pdf/1804.03720.pdf

It’s a very nice paper and definitely “must read” before starting this contest. There are also all scores from the test levels for jerk, ppo2 and rainbow agents.

Other Blog Posts for further reading:

View story at Medium.com

View story at Medium.com

That’s all for this post. Thanks for reading.

Tutorial: How to access variable from other class in Unity

I have dealt with this problem very often.

How to get variables from other Class in Unity? For example I’ve got one Class with public Variables and i want to get the same value of the Variable in other Class.

For Example it could be a Class which adds points to overall Score. But in other Class i show it using Unity UI System as Text.

Here is the Class with Variable which is calling the Function in other Class.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class Score : MonoBehaviour {

  public GameObject scoreScript;
  // Use this for initialization

  public void OnTriggerEnter2D(Collider2D node)
  {
    if (node.gameObject.tag == "Apple") {
      Destroy (node.gameObject);
      ScoreMenager scorePointsScript =  scoreScript.GetComponent<ScoreMenager>();
      scorePointsScript.AddScore ();

    }
  }
}

 

And here is the Class which contains the Function where the variable points gets incremented.

using System.Collections.Generic;
using UnityEngine.UI;
using UnityEngine;

public class ScoreMenager : MonoBehaviour {
  public int points;


  Text text;

  void Awake ()
  {
    text = GetComponent<Text> ();

  }
  // Use this for initialization
  void Start () {
  }

  public void AddScore()
  {
    points++;
  }
  // Update is called once per frame
  void Update () {
    

    text.text = "Score: " + points;
  }
}

 

Don’t Forget to Add The “Score” Script to your character! And the same for ScoreMenager, which needs to be added to the UI Element.

And also don’t forget to Create the Text as GUI

Also important is the fact, that you need to change the “Tag” of your element which will be destroyed. After it increases the Score Points value.

Component > UI > Text

 

Other tutorials in the series:

How to make the objects fall in Unity

Making Objects Fall Random On The Screen in Unity

How to Speed Up (Increase Speed) Time in Unity

How to create a simple countdown Timer in Unity

Rendering Crisp Pixelart in Phaser (2017)

I had this problem while building my pixelart game in Phaser. My Pixelart was blurry. It didn’t looked nice. I was searching the web for an answer but i didn’t found one. Many of the tutorials were outdated so i decided to write my solution here.

You can find the working example here: http://www.noob-programmer.com/pixelart_example/

Example above works for Chrome and Mozilla Firefox.

Solution includes editing CSS and one line in phaser script.

Here is the CSS which you need to apply to your html element:

body {
  filter: none;
  image-rendering: -moz-crisp-edges;
  image-rendering: -webkit-crisp-edges;
  image-rendering: pixelated;
  image-rendering: crisp-edges;
}

Used it on body element.

The other line is the Phaser script:
var game = new Phaser.Game(200, 150, Phaser.AUTO, 'gameContainer', {
      preload: preload,
      create: create,
      update: update
    }, null, false, false);

The last parameter (“false”) is for antialiasing. It needs to be set to false.

What about other browsers? What do i need to change in order to get crispy Pixelart? Check this solution:

body {
  -ms-interpolation-mode: nearest-neighbor; // IE 7+ (non-standard property)
  image-rendering: -webkit-optimize-contrast; // Safari 6, UC Browser 9.9
  image-rendering: -webkit-crisp-edges; // Safari 7+
  image-rendering: -moz-crisp-edges; // Firefox 3.6+
  image-rendering: -o-crisp-edges; // Opera 12
  image-rendering: pixelated; // Chrome 41+ and Opera 26+
}

Source: https://builtvisible.com/image-scaling-in-css/

Source: https://developer.mozilla.org/en-US/docs/Games/Techniques/Crisp_pixel_art_look

Poradnik: Jak skonfigurować konto git z GitLab.

Ze względu tworzenia projektów na studia i konieczności używania systemu git zostałem zmuszony porzucić na jakiś czas GitHuba i zacząć używać GitLaba.

Powód?

Prywatne repozytoria.

Każdy z nas dostaje co tydzień szereg zadań programistycznych, które musi wykonać. Samemu, bądź z partnerem w grupie. Ze znanych dostępnych mi serwisów oferujących prywatne repozytoria należą wcześniej wymieniony GitLab oraz Bitbucket. Bitbucekt jednak oferuje ograniczenie do 5 członków. Co jednak pasuje, ale jednak ze względu na to, że słyszałem wiele dobrego o GitLabie to postanowiłem go wypróbować.

Read More