FL Train Step#

When you trigger an FL train step, a server is set up, which communicates with the FL clients, hosted on edge devices. On the edge devices, this is handled by an OctaiPipe Pipeline step (OctaiPipe Steps). called the FL Train Step. The following guide goes through the FL Train Step, showing the user how to configure it and how it can be extended using custom pipeline steps.

Methods in the FL Step#

The FL step inherits from the base PipelineStep in OctaiPipe. It uses PipelineStep as well as own methods to set up and run FL training. The following methods are implemented in the FL Train Step:

  • __init__

  • _check_data

  • _get_model

  • load_datasets

  • run

The __init__ method initializes the class by initializing the PipelinStep parent class as well as checking the input and evaluation data specs using the _check_data method. The model is also initialized using the _get_model method.

The _check_data method goes through the input_data_specs and evaluation_data_specs to find the field relevant to the device the FL step is running on. It returns the input_data_specs fields for the device ID that the step is running on, or uses the values found for default if not device ID match is found.

The _get_model method checks the model_specs to see if the model type is in the model mapping from the default OctaiPipe models. If not, it attempts to retrieve the model from a local custom mapping or download it from blob storage.

The load_datasets method uses the PipelineStep’s _load_data method to first laod the training data, then sets self._evaluation_data_specs to self._input_data_specs so that the _load_data method can be used to get the test dataset. This method gets called in the setup_loaders method in the model. This is so that users can define their own generators using custom models. For more information on how to use custom models, check out the documentation on custom FL models, Custom PyTorch Model.

The run method is the method that actually runs federated learning. It does so by calling the setup_loaders method in the model class, setting up the relevant client for the framework, and running the client. The run method takes the server_ip as an argument to hand to the client.

Configuring the FL train step#

Below is an example of the config file used to set up federated learning. For the FL Train Step, the infrastructure field is not included. The run specs are popped and given to the run method and the rest are given to the method on initialization.

  1name: federated_learning
  4  server: kubernetes
  5  backup_server: [deviceId]
  6  device_ids: [FL-01, FL-02, FL-03, FL-04]
  9  devices:
 10    - device: default
 11      datastore_type: influxdb
 12      query_type: dataframe
 13      query_template_path: ./configs/data/influx_query_def.txt
 14      query_values:
 15        start: "2022-11-10T00:00:00.000Z"
 16        stop: "2022-11-11T00:00:00.000Z"
 17        bucket: cmapss-bucket
 18        measurement: sensors-raw
 19        tags: {}
 20    - device: FL-01
 21      datastore_type: influxdb
 22      query_type: dataframe
 23      query_template_path: ./configs/data/influx_query_1.txt
 24      query_values:
 25        start: "2022-11-10T00:00:00.000Z"
 26        stop: "2022-11-11T00:00:00.000Z"
 27        bucket: cmapss-bucket
 28        measurement: sensors-raw
 29        tags: {}
 30    - device: FL-02
 31      datastore_type: influxdb
 32      query_type: dataframe
 33      query_template_path: ./configs/data/influx_query_2.txt
 34      query_values:
 35        start: "2022-11-10T00:00:00.000Z"
 36        stop: "2022-11-11T00:00:00.000Z"
 37        bucket: cmapss-bucket
 38        measurement: sensors-raw
 39        tags: {}
 40    - device: FL-03
 41      datastore_type: influxdb
 42      query_type: dataframe
 43      query_template_path: ./configs/data/influx_query_3.txt
 44      query_values:
 45        start: "2022-11-10T00:00:00.000Z"
 46        stop: "2022-11-11T00:00:00.000Z"
 47        bucket: cmapss-bucket
 48        measurement: sensors-raw
 49        tags: {}
 50    - device: FL-04
 51      datastore_type: influxdb
 52      query_type: dataframe
 53      query_template_path: ./configs/data/influx_query_4.txt
 54      query_values:
 55        start: "2022-11-10T00:00:00.000Z"
 56        stop: "2022-11-11T00:00:00.000Z"
 57        bucket: cmapss-bucket
 58        measurement: sensors-raw
 59        tags: {}
 60  data_converter: {}
 63  devices:
 64    - device: default
 65      datastore_type: influxdb
 66      query_type: dataframe
 67      query_template_path: ./configs/data/influx_query_eval_def.txt
 68      query_values:
 69        start: "2022-11-10T00:00:00.000Z"
 70        stop: "2022-11-11T00:00:00.000Z"
 71        bucket: cmapss-bucket
 72        measurement: sensors-raw
 73        tags: {}
 74    - device: FL-01
 75      datastore_type: influxdb
 76      query_type: dataframe
 77      query_template_path: ./configs/data/influx_query_eval_1.txt
 78      query_values:
 79        start: "2022-11-10T00:00:00.000Z"
 80        stop: "2022-11-11T00:00:00.000Z"
 81        bucket: cmapss-bucket
 82        measurement: sensors-raw
 83        tags: {}
 84    - device: FL-02
 85      datastore_type: influxdb
 86      query_type: dataframe
 87      query_template_path: ./configs/data/influx_query_eval_2.txt
 88      query_values:
 89        start: "2022-11-10T00:00:00.000Z"
 90        stop: "2022-11-11T00:00:00.000Z"
 91        bucket: cmapss-bucket
 92        measurement: sensors-raw
 93        tags: {}
 94    - device: FL-03
 95      datastore_type: influxdb
 96      query_type: dataframe
 97      query_template_path: ./configs/data/influx_query_eval_3.txt
 98      query_values:
 99        start: "2022-11-10T00:00:00.000Z"
100        stop: "2022-11-11T00:00:00.000Z"
101        bucket: cmapss-bucket
102        measurement: sensors-raw
103        tags: {}
104    - device: FL-04
105      datastore_type: influxdb
106      query_type: dataframe
107      query_template_path: ./configs/data/influx_query_eval_4.txt
108      query_values:
109        start: "2022-11-10T00:00:00.000Z"
110        stop: "2022-11-11T00:00:00.000Z"
111        bucket: cmapss-bucket
112        measurement: sensors-raw
113        tags: {}
114  data_converter: {}
117  type: base_torch
118  load_existing: false
119  name: test_torch
120  model_load_specs:
121    version: '000'
122  model_params:
123    loss_fn: mse
124    scaling: standard
125    metric: rmse
126    epochs: 10
127    batch_size: 32
130  target_label: RUL
131  cycle_id: "Machine number"
132  backend: pytorch

The input_data_specs and evaluation_data_specs define the configuration for how to get the training and evaluation data. The output_data_specs are not used in the current FL Train Step but can be used for saving any data if custom implementations are wanted.

The model_specs define which model to use, whether it be a native OctaiPipe model or a custom model. Important here are the model_params, which get handed to the model on initialization. For the default PyTorch model, this includes things such as number of epochs and, which loss function to use and the batch size.

The run_specs as mentioned are passed to the run method. This requires the a target_label (outcome variable column name) to be defined. The cycle_id here that the training and validation sets contain data for a certain proportion of cycles is the column which defines an operating cycle. The data can be grouped on this so rather than a proportion of rows of data. The backend is which FL client to use. For example, for PyTorch this would be “pytorch”.

Making a custom FL train step#

In order to make a completely customized FL Train Step, the user can define a custom pipeline step. This guide will not go through in detail how that is done, but it is worth noting that a custom pipeline step needs to implement a run method which initializes a client and starts it, linking it to the server_ip from the run_specs.

To implement a custom step, it is also important to understand any model class being used, whether it is a native OctaiPipe model or a custom model.

To get more information on custom OctaiPipe pipeline steps, see this guide: Custom Pipeline Steps

To further understand the base PyTorch model and to understand how to implement a custom PyTorch model, see this guide: Custom PyTorch Model