Federated Reinforcement Learning#

Reinforcement Learning is a machine learning paradigm centered around a trained agent taking actions given a current state in order to maximize some reward function. See for example the Wikipedia page on Reinforcement Learning.

Reinforcement Learning has been used to train agents that successfully carry out tasks in areas such as GO, computer games, or autonomous driving vehicles.

OctaiPipe implements a framework for training Reinforcement Learning agents in Federated settings. It uses the same commands and structure as other FL workloads, with some adjustment for RL specific components and configuration. A sample config file for Federated Reinforcement Learning (FRL) is shown below.

name: federated_learning

infrastructure:
  device_ids:
    - device-0
    - device-1
  image_tag: 3.2.1

model_specs:
  type: frl
  name: test_rl_model
  model_params:
    env:
      path: ./envs/env_test.py
      params:
        gravity: 9.8
        masscart: 1.0
    policy:
      name: PPO
      params:
        learning_rate: 0.0003
        gamma: 0.99
        clip_range: 0.2
        n_steps: 128
        batch_size: 64
        n_epochs: 10

run_specs:
  backend: frl

The config file follows similarly to the config of other FL experiments. The name at the top should be federated_learning. There’s an infrastructure section to define on what devices and with what version of OctaiPipe to run the experiment. In the run_specs, the only key field is backend, which should be set to frl.

The data_specs and model_specs will be explained inmore detail below:

Data specs#

As opposed to other FL workloads in OctaiPipe, FRL does not use input_data_specs or evaluation_data_specs. This is because some RL environments do not necessarily need to read data for training or evaluation. If data loading is required, this should be handled in the RL environment Python file. See more at RL Environments.

Model specs#

The model_specs configure the details for what model to run and how. For FRL, the type should always be frl and the name should be something that helps you understand what the model does, e.g. thermostat_control_agent.

The model_params contain two fields. The env field has a path specifying the local filepath to a Python file containing the RL environment and params which contains the parameters to set for that environment. Details on environments found at RL Environments. The policy field contains a name, which specifies which FRL policy to use and params to hand that policy. Policies are explained in detail in RL Policies.

RL Environments and Policies#

Federated Reinforcement Learning