Here's a quick overview of the outcomes. Details and instructions are given in the Jupyter notebook.
The notebook isn't entirely stable across runs, so your results may vary.
## Discrete Action Inverted Pendulum Environment
The `CartPole-v1` environment gives a reward of 1 for every step that the pendulum is upright (+- 15 degrees) and visible in the simulation (position +-2.4). I took an "extended rewards" approach by adding penalties for the pendulum angle and cart position.
