Tutorial - Federated Learning Anomaly Detection#

This tutorial goes through how to train a time-series anomaly detection model using federated learning. The dataset used is the Water Quality dataset from the GECCO Challenge 2018 used in for example Lai, et al. (2021) for time-series anomaly detection benchmarking.

Two models will be trained. One using the built-in OctaiPipe base PyTorch model and one using a custom model architecture.

The guide goes through the following steps:

  1. Load Anomaly Detection data and split among devices

  2. Set up FL config

  3. Train base PyTorch FL model

  4. Define custom model architecture

  5. Train custom FL model

Register devices#

If you haven’t already, register your edge devices/computers via OctaiPipe web portal. This example uses influxdb as a data source, but you can configure any data source you prefer.

Load Anomaly Detection data and split among devices#

In this step we load the dataset, split it in two and distribute it across the two devices to simulate a federated context. In a real-world setting, the data would already be stored on each device separately. We also change the timestep to start on January 1st 2023.

The dataset is available from the GECCO challenge 2018. It is also available in the Tutorials folder in the OctaiPipe Jupyter instance. If you are running this from the Tutorials folder it should run without any further edits.

The dataset consists of 9 features, such as pH level and temperature as well as a target variable coding whether a significant event is happening or not. Our goal is to define a model which detect anomalies in water quality.

We load both the train and test data here separately and export them to the devices.

[15]:
import os
from octaipipe.helper_funcs.example_utils import load_gecco


# Load dataset
trainpath: str = './waterDataTraining.RDS'
data_train = load_gecco(trainpath)

# Split into two datasets
data_train.set_index('_time', inplace=True)
data_train_0 = data_train.iloc[:int(data_train.shape[0]/2),:].copy().sample(frac=0.2, random_state=42)
data_train_1 = data_train.iloc[int(data_train.shape[0]/2):,:].copy().sample(frac=0.2, random_state=42)

os.makedirs('./configs/datasets', exist_ok=True)
data_train_0.to_csv('./configs/datasets/waterDataTraining_0.csv')
data_train_1.to_csv('./configs/datasets/waterDataTraining_1.csv')

# Load test dataset
testpath: str = './waterDataTesting.RDS'
data_test = load_gecco(testpath)

# Split into two test datasets
data_test.set_index('_time', inplace=True)
data_test_0 = data_test.iloc[:int(data_test.shape[0]/2),:].copy().sample(frac=0.2, random_state=42)
data_test_1 = data_test.iloc[int(data_test.shape[0]/2):,:].copy().sample(frac=0.2, random_state=42)

data_test_0.to_csv('./configs/datasets/waterDataTesting_0.csv')
data_test_1.to_csv('./configs/datasets/waterDataTesting_1.csv')
[ ]:
from write_data import write_data_to_devices

devices: list[str] = ['device-0', 'device-1']  # Enter the two registered devices you want to write to

bucket: str = "test-bucket"
measurement: str = "water-quality"

write_data_to_devices(devices)

Set up FL config#

Next we will set up the Federated Learning configuration file. For federated learning, like with any feature of OctaiPipe, we declare two main things: (1) what to run and (2) how to run it. In this case, federated learning is the what and the configuration file is the how. The OctaiPipe documentation goes through FL in greater detail, but we will still go through how to fill out the config step-by-step in this guide.

The FL config file is a YAML file consisting of 7 main fields: name, infrastructure, input_data_specs, evaluation_data_specs, model_specs and run_specs.

The configuration file we will use for this example can be found in the configs folder in the the Jupyter Tutorials folder. It is also pasted below.

Name#

In config: name

This is the name of the file, and can technically be anything, but for simplicity’s sake we have named the FL config federated_learning.

Infrastructure#

In config: infrastructure

This section specifies overall infrastructure of the FL exeperiment, in particular:

  • device_ids: list of device IDs in registry of devices to run FL on

  • device_groups: device group(s) to run FL on. This can be str of one group or list of many

  • image_name: the octaipipe image to run FL against. Defaults to latest torch image

  • server_image: the image to use for the FL server. Defaults to latest image

  • env: dictionary of environment variable names and values to set on devices

Input Data Specs#

In config: input_data_specs

This is the specifications for the input data of the FL model. To get details on input specs in general, see data layer documantation. The FL configurations differ from the regular OctaiPipe data specs in that they can be specified for each device individually. This is to allow for more heterogenous datasets on the devices.

Under input_data_specs, the user can define diffrent specs per device in the devices list. If a device is not found in the list, the data specs for the default device will be used.

If the same specs should be used for every device, data specs can be defined as for a regular OctaiStep.

Evaluation data specs#

In config: evaluation_data_specs

This works the same way as the input data specs but defines the portion of data on the device to use for evaluating the model each training round.

Model specs#

In config: model_specs

The specifications for the model to use. Details on FL models in OctaiPipe can be found in the custom model documentation.

The fields in model specs are as follows:

  • type: model type, e.g. base_pytorch for OctaiPipe base PyTorch model

  • name: name of model (how it will be saved in DB)

  • model_params: input parameters to initializing the model, e.g. batch_size

  • custom_model: whether to use a custom FL model, specified in Python file_path

    • file_path: local path to model file to use

Run specs#

In config: run_specs

The runtime specifications for the FL step.

  • target_label: the name(s) of the target variable. Can be string or list of strings for multiclass. Leave empty for unsupervised models

  • cycle_id: ID for sequenced data, if not None, splits data for train-val by cycles not rows

  • backend: the backend to use for FL, e.g. pytorch, sklearn, xgboost

YAML config#

Below is a filled out config that will work for this tutorial. Create a folder called ‘configs’ and save the below yaml file as ‘fl_config.yml’. You can choose any filename you’d like, just make sure to reference it correctly in the jupyter cells below.

./configs/fl_config.yml

name: federated_learning

infrastructure:
  device_ids: [demo-device-0, demo-device-1]
  device_group:
  image_name: octaipipe.azurecr.io/octaipipe-all_data_loaders:latest
  server_image: octaipipe.azurecr.io/fl_server:latest
  env: {}

input_data_specs:
  demo-device-0:
  - datastore_type: influxdb
    settings:
      query_template_path: ./configs/influx_query.txt
      query_type: dataframe
      query_config:
        bucket: test-bucket
        measurement: water-quality
        start: '2023-01-01T00:00:00.000Z'
        stop: '2023-02-19T00:00:01.000Z'
        tags: {}
  demo-device-1:
  - datastore_type: influxdb
    settings:
      query_template_path: ./configs/influx_query.txt
      query_type: dataframe
      query_config:
        bucket: test-bucket
        measurement: water-quality
        start: '2023-02-19T00:00:01.000Z'
        stop: '2023-04-08T08:00:00.000Z'
        tags: {}

evaluation_data_specs:
  demo-device-0:
  - datastore_type: influxdb
    settings:
      query_template_path: ./configs/influx_query.txt
      query_type: dataframe
      query_config:
        bucket: test-bucket
        measurement: water-quality
        start: '2023-04-08T08:00:00.000Z'
        stop: '2023-05-26T19:00:00.000Z'
        tags: {}
  demo-device-1:
  - datastore_type: influxdb
    settings:
      query_template_path: ./configs/influx_query.txt
      query_type: dataframe
      query_config:
        bucket: test-bucket
        measurement: water-quality
        start: '2023-05-26T19:00:00.000Z'
        stop: '2023-07-14T06:00:00.000Z'
        tags: {}

model_specs:
  type: base_torch
  name: water_quality_ad
  model_params:
    loss_fn: cross_entropy
    scaling: standard
    metric: f1_score
    epochs: 10
    batch_size: 32

run_specs:
  target_label: event
  cycle_id:
  backend: pytorch

Additionally, create a file ./configs/influx_query.txt with the following contents:

from(bucket:[bucket])
    |> range(start: [start], stop: [stop])
    |> filter(fn:(r) => r._measurement == [measurement])
    |> filter(fn:(r) => [fields])
    |> keep(columns: ["_time", "_field", "_value"])
    |> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
    |> drop(columns: ["_start", "_stop", "_batch",
                      "table", "result", "_measurement",
                      "_result", "id"])

Train base PyTorch FL model#

Next, we run FL training with the configured base OctaiPipe PyTorch model.

Running FL can be easily done with just a few commands. In this example, we also explicitly set the FL Strategy so that we start running training once two devices have connected to the server. This is already the default configuration but we set it here as an example.

[ ]:
import logging
from octaipipe.federated_learning.run_fl import OctaiFL
logging.basicConfig(level=logging.INFO, format='%(message)s')

config_path = './configs/fl_config.yml'
strategy = {'min_available_clients': 2}

fl = OctaiFL(config_path)
fl.strategy.set_config(strategy)
fl.run()

We can view the results of FL by looking at three main registries: experiments, models, and evaluation.

Use the experiment ID from the FL logs to view the experiment and model record, and then use the model ID from the model record to get the evaluation data.

[ ]:
from octaipipe import experiments


exps = experiments.get_experiment_by_id('35a57e30')
exps
[ ]:
from octaipipe import models


mods = models.find_models_by_experiment_id('35a57e30')
mods
[ ]:
from octaipipe_core.client.evaluation_client import EvaluationClient
import pandas as pd


eval_client = EvaluationClient()
evals = eval_client.get_evaluation_model_id('5b386bc2')
pd.DataFrame(evals).sort_values('communicationRound')

Define custom model architecture#

As opposed to many hats, machine learning models are seldom one-size fits all; users will often need to define their own neural network architectures and data processing utils to fit their dataset or solution. In order to enable this, OctaiPipe allows users to define custom models.

A custom model is defined as a class in a Python file. For details, see the custom model documentation.

The custom model we define here implements a custom model architecture using the _build_model and forward methods. It also implements a custom data loading mechanism by adding lagged features and undersampling to make the dataset less imbalanced.

The contents of the custom model file can be found in the Tutorials folder in configs/dnn_ad_model.py

Train custom FL model#

To train the custom model we have built, we edit the FL config file in the model_specs section. The model_specs for the custom model are below:

model_specs:
  type: custom_ad_torch
  name: water_quality_ad_custom
  model_params:
    loss_fn: cross_entropy
    scaling: standard
    metric: f1_score
    epochs: 100
    batch_size: 32
    num_lags: 3
  custom_model:
    file_path: ./configs/dnn_ad_model.py

Note that we have changed the model type to our own custom model type. We have added the custom_model section to specify the local file_path of the Python file containing the model class. We have also added the extra num_lags argument in model_params as this gets handed to the __init__() method of the model class.

To run FL, we generate a new FL config file from the previous one and simply run OctaiFL as before.

[ ]:
import yaml


with open('./configs/fl_config.yml', 'r') as f:
    config = yaml.safe_load(f)

model_specs = {
    'type': 'custom_ad_torch',
    'name': 'water_quality_ad_custom',
    'model_params': {
        'loss_fn': 'cross_entropy',
        'scaling': 'standard',
        'metric': 'f1_score',
        'epochs': 100,
        'batch_size': 32,
        'num_lags': 3
    },
    'custom_model': {
        'file_path': './dnn_ad_model.py'
    }
}

config['model_specs'] = model_specs

with open('./configs/fl_config_custom_model.yml', 'w') as f:
    yaml.safe_dump(config, f)
[ ]:
from octaipipe.federated_learning.run_fl import OctaiFL


config_path: str = './configs/fl_config_custom_model.yml'
strategy = {'min_available_clients': 2}

fl = OctaiFL(config_path, strategy)
fl.run()

Now, let’s again look at the experiment, model and evaluation in the registry.

[ ]:
from octaipipe import experiments


exps = experiments.get_experiment_by_id('cc35e5a7')
exps
[ ]:
from octaipipe import models


mods = models.find_models_by_experiment_id('cc35e5a7')
mods
[ ]:
from octaipipe_core.client.evaluation_client import EvaluationClient
import pandas as pd


eval_client = EvaluationClient()
evals = eval_client.get_evaluation_model_id('c4292d2b')
pd.DataFrame(evals).sort_values('communicationRound')

Results for this differ slightly every time given how the network initializes, but this solution would normally result in an f1-score of around 0.03-0.04. Comparing this to the submissions of the GECCO challenge 2018, we place somewhere around 10th. The aim here is not to win the competition, but rather to show how very simple adjustments to model architecture and data processing can improve model performance.

If you want to see if you can do better, try to edit the architecture or data manipulation in configs/dnn_ad_model.py.

Hint: Swap out the downsampling to focal loss criterion or consider using convolutional layers instead of fully connected layers.