RL Environments#
As with other RL model frameworks, you will need to set up your environment before you start modeling in OctaiPipe. OctaiPipe offers the ability to build custom environments based on the Gymnasium Env class.
Custom environments need to inherit from the Gymnasium Env class and requires the
user to implement the __init__
, reset
, and step
methods:
__init__
: initializes variables, e.g. action and observation spacereset
: resets environment to original state and return initial observationstep
: runs one time step in environment given actionrender
: renders environment if desired. Can be implemented as dummy class
An example minimal env class can be found below:
import gymnasium as gym
from gymnasium import spaces
import numpy as np
class MyCustomEnv(gym.Env):
metadata = {"render_modes": ["human"], "render_fps": 30}
def __init__(self, obs_low: int = 0, obs_high: int = 1):
super().__init__()
self.observation_space = spaces.Box(low=obs_low, high=obs_high, shape=(4,), dtype=np.float32)
self.action_space = spaces.Discrete(2)
def reset(self, *, seed=None, options=None):
super().reset(seed=seed)
self.state = np.random.rand(4)
info = {}
return self.state, info
def step(self, action):
self.state = np.random.rand(4)
reward = 1.0
terminated = False
truncated = False
info = {}
return self.state, reward, terminated, truncated, info
def render(self, mode='human'):
# NOTE: render is always expected, but can be passed as dummy class
print(f"State: {self.state}")
def close(self):
pass
In the FL config, we define the env within the model_params
in the model_specs
field. We specify the path to the env file with the path
field and any parameters
to give to the __init__
method in params
:
1 model_specs:
2 type: frl
3 name: test_model
4 model_params:
5 policy:
6 name: PPO
7 params: {}
8 env:
9 path: ./path/to/env_file.py
10 params:
11 obs_low: 0
12 obs_high: 1
Data Loading for FRL#
For some environments, it might be useful to read data from a file or database and use it for training the model. To do this, build the data loading into your custom environment file and use the data as any other object.
Remember that any file paths present on device need to be specified in the FL config, so if you are reading from a file, include it as an initialization parameter for your environment in the FL config file.
In the example below, we use the env parameter init_states_path
to specify a file
that we wish to use as initial states of the environment when we run the reset()
method.
The example code and config are below.
Data loading env#
import gymnasium as gym
from gymnasium import spaces
import numpy as np
class MyCustomEnv(gym.Env):
metadata = {"render_modes": ["human"], "render_fps": 30}
def __init__(self, init_states_path: str):
super().__init__()
self.init_states = pd.read_csv(init_states_path)
self.cur_state = 0
self.observation_space = spaces.Box(shape=self.init_states.shape, dtype=np.float32)
self.action_space = spaces.Discrete(2)
def reset(self, *, seed=None, options=None):
super().reset(seed=seed)
self.state = self.init_states.iloc[self.cur_state,]
self.cur_state += 1
if self.cur_state >= self.init_states.shape[0]:
self.cur_state = 0
info = {}
return self.state, info
def step(self, action):
self.state = np.random.rand(self.data.shape[0])
reward = 1.0
terminated = False
truncated = False
info = {}
return self.state, reward, terminated, truncated, info
def render(self, mode='human'):
print(f"State: {self.state}")
def close(self):
pass
Data loading config#
1 model_specs:
2 type: FRL
3 load_existing: false
4 name: test_model
5 model_params:
6 policy:
7 name: PPO
8 params: {}
9 env:
10 path: ./path/to/env_file.py
11 params:
12 init_states_path: ./path/to/file.csv
Using environments in training#
The training of the RL model on each FL client is simply executed by running the learn()
method from the policy class. The RL environment is set on initialization of the policy
class and any logic pertaining to training should be defined with this in mind, i.e.
arguments are not handed to the learn()
method at runtime.
Evaluation environment setup#
The evaluation of RL agents is carried out using the method below. Note that the environment
defined in the env
section of the FL config is handed to the test method at runtime.
If the environment class has attributes for n_episodes
, render
, or max_steps_per_test_episode
,
these are used in evaluation, else defaults are set.
def test_model(self, test_env: gym.Env) -> dict:
if hasattr(test_env, "n_test_episodes"):
n_episodes: int = test_env.n_test_episodes
else:
n_episodes: int = 5
if hasattr(test_env, "render"):
render: bool = test_env.render
else:
render: bool = False
if hasattr(test_env, "max_steps_per_test_episode"):
max_steps_per_test_episode: int = test_env.max_steps_per_test_episode
else:
max_steps_per_test_episode: int = 10000
rewards, lengths = [], []
for _ in range(n_episodes):
reset_out = test_env.reset()
obs = reset_out[0] if isinstance(reset_out, tuple) else reset_out
done = False
total_r: float = 0.0
steps: int = 0
while not done and steps < max_steps_per_test_episode:
raw_action, _ = self.predict(obs, deterministic=True)
action = self.format_action(test_env, raw_action)
obs, r, done, info = self.step_and_unpack(test_env, action)
total_r += r
steps += 1
if render:
test_env.render()
rewards.append(total_r)
lengths.append(steps)
if not rewards or not lengths:
raise RuntimeError('No rewards or lengths retrieved, FRL '
'evaluation did not complete successfully. Got\n'
f'rewards: {rewards}, lengths: {lengths}')
return {
"mean_reward": float(np.mean(rewards)),
"std_reward": float(np.std(rewards)) if len(rewards) > 1 else np.nan,
"min_reward": float(np.min(rewards)),
"max_reward": float(np.max(rewards)),
"mean_length": float(np.mean(lengths)),
}