Tutorial - Running Federated Reinforecement Learning (FRL) with OctaiPipe#

In this tutorial I will take you through a standard Federated Learning deployment using RL with PPO algorithm. RL is different from traditional ML algorithm as RL Learns by interacting with an environment over time to learn a policy that maximizes cumulative reward through trial and error. In RL, there’s no fixed dataset — data is generated dynamically as the agent acts.

The environment Cartpole from gymnasium being chosen for this tutorial.

We will go through the following steps:

  1. Set up the environment

  2. Set up the configuration file

  3. Running FL

Step 1 - Set up the environment#

The environment, defined as a Python class, is passed to the workflow in a .py file. Below is an example of the Cartpole environment. Each environment must implement the reset, step, and render methods for RL to run in a federated manner.

Note that : You can change the input parameters like ‘gravity’ and ‘masscart’ to create your own environments for RL training.

You can go through the gymnasium example to check out some of the other environments by yourself.

[8]:
import gymnasium as gym
from gymnasium import spaces
from typing import Tuple, Any

class CustomCartPole(gym.Env):
    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 50}

    def __init__(self, gravity: float = 9.8, masscart: float = 1.0):
        super().__init__()
        # Create a base CartPole environment internally
        self.inner_env = gym.make("CartPole-v1")

        # Modify the unwrapped gravity
        self.inner_env.unwrapped.gravity = gravity
        self.inner_env.unwrapped.masscart = masscart

        # Expose the same action and observation spaces as the inner env
        self.action_space: spaces.Space = self.inner_env.action_space
        self.observation_space: spaces.Space = self.inner_env.observation_space

    def reset(self, **kwargs) -> Tuple[Any, dict]:
        return self.inner_env.reset(**kwargs)

    def step(self, action: Any) -> Tuple[Any, float, bool, bool, dict]:
        return self.inner_env.step(action)

    def render(self, mode: str = "human"):
        return self.inner_env.render()

    def close(self):
        return self.inner_env.close()

Step 2 - Set up the configuration file.#

Below is an example of the configuration to set up federated learning experiment with RL. In practise, the configuration file will be saved in the configs folder of the workspace.

NOTE: You will need to edit the follwing in this file:

  • Device IDs in the device_ids list

  • If you wish to use a specific version of OctaiPipe, change the latest tag in the image names to the one you would like to use

  • Change the ‘path’ in the env if you are saving the environment file in a different location.

  • Feel free to play with the policy parameters (listed under model_params/policy/params) to see changes in the FRL results.

[ ]:
import yaml
# Display federated learning config
with open("configs/cartpole_config.yml", 'r') as file:
    cartpole_config = yaml.safe_load(file)
print(yaml.dump(cartpole_config, sort_keys=False))

Step 3 - Running FL#

[10]:
import os
import logging

os.environ['OCTAIPIPE_DEBUG'] = 'true'
logging.basicConfig(level=logging.INFO, format='%(message)s')

Set up the octaifl context by passing the config file, name and escription to OctaiFl

[ ]:
from octaipipe.federated_learning.run_fl import OctaiFL

config_path = "configs/cartpole_config.yml"

octaifl = OctaiFL(
    config_path,
    deployment_name='FRL probe env',
    deployment_description='FRL deployment'
)

Set up the strategy. Note that, the parameter ‘num_rounds’ indicates the number of fl rounds, commnication between the server and client.

[ ]:
strategy = {
    'experiment_description': 'FRL Experiment',
    'num_rounds': 100,                                             # Number of Fl rounds, communication between server and the client.
    'min_available_clients': 2,
    'min_fit_clients': 2,
    'strategy_name': 'frl',
}

octaifl.strategy.set_strategy_values(strategy)

Finally, we can run the experiment.

[ ]:
experiment_id = octaifl.run()

Checking the processes#

While running, it will give you a link to check the experiment results, aggregated metrics at the server side.

There are now two processes running, the server and the clients.

The server logs will get printed in the notebook.

You can also log into the device to get the client deployment logs by running docker logs -f <id of container>

[ ]: