Tutorial - Adversarial Fortification for FL XGBoost#

This tutorial goes through the process of running FL XGBoost in a setting where there are malicious clients, intending to sabotage model performance.

We intentionally set up devices to impede model performance, but show the user how to make use of OctaiPipe’s Adversarial Fortification toolkit to protect against such attacks.

We will go through the following steps:

Getting Garment dataset
Preprocess and partition dataset
Introduce problematic data
Writing datasets to devices
Setting up FL XGBoost
Running FL XGBoost without adversarial fortification
Adding Adversarial Fortification

Getting the dataset#

First, we need to get the dataset we’re using. We will use the Garment dataset from UC Irvine’s Machine Learning Repository.

This is a tabular, industrial time-series dataset predicting employee productivity from features of Garment’s production process.

[ ]:

!pip install ucimlrepo

[1]:

from ucimlrepo import fetch_ucirepo


# fetch dataset
data = fetch_ucirepo(id=597)
X = data['data']['features']
y = data['data']['targets']

Preprocess and partition dataset#

Next, we preprocess the dataset and partition it into 4 parts. The resulting object datasets is a list with dictionaries with train and test split accessible like datasets[0][‘train’].

[2]:

from garment_helper_funcs import preprocess_garment_data, partition_data, send_datasets_to_devices, save_data_locally


data = preprocess_garment_data(X, y)
datasets = partition_data(data)

Introduce problematic data#

In order to mess up model performance, we will now intentionally mess up the data in one of our dataframes. This will simulate a malicious client that gives wrong data and duplicated data in order to throw off the global model.

In this case, the reported data for some teams will have a lower target productivity and higher over time hours worked. The data will also be replicated over and over to make these teams more important for the global model.

[3]:

import pandas as pd


data_0 = datasets[0]['train'].copy()
data_0['targeted_productivity'] = data_0['targeted_productivity']/2
data_0['targeted_productivity_1'] = data_0['targeted_productivity_1']/2
data_0['over_time'] = data_0['over_time']*4
data_0['over_time_1'] = data_0['over_time_1']*4
data_0 = pd.concat([data_0]*5, axis = 0)
data_0 = data_0.reset_index(drop=True)

datasets[0]['train'] = data_0

Save Data locally#

This will write the partitioned data to ./datasets/devices.

[1]:

# Update this with a list of registered devices to use in this tutorial
devices: list[str] = ['device-0', 'device-1', 'device-2', 'device-3']

[ ]:

assert len(devices), "Please update the devices variable with the list of devices you'll be running on"
for idx in range(len(devices)):
    save_data_locally(datasets[idx], idx)

Write data to devices#

The following uses the OctaiPipe data_writing step to load the data split above to the registered devices.

The python code can be found in ./write_data.py it ensures the split data is present, completes the config templates with details about the relevant devices and file paths and starts two data_writing steps, one for writing test and one for train, on each device.

The train and test data will be sent to each of the devices at the paths /tmp/garment_train.csv and /tmp/garment_test.csv respectively. Check the output of the cells to enure there are no issues downloading and sending the datasets. If possible, check the devices themselves for the presence of the test and train CSVs

[ ]:

from garment_helper_funcs import send_datasets_to_devices

send_datasets_to_devices(devices)

Setting up FL XGBoost#

Next we run the FL XGBoost experiment.

We will use the configuration file printed below for all runs.

NOTE: You will need to edit the follwing in this file:

Device IDs in the device_ids list
If you wish to use a specific version of OctaiPipe, change the latest tag in the image names to the one you would like to use
Change the file_path in the input and evaluation data specs to that of your device user, e.g. if your user is linus, your path would be /home/linus/datasets/garment_train.csv. If you have different users for each device, you can add a new device to the devices list and specify the input data specs for that device separately, see data documentation for FL

[ ]:

import yaml

# Display federated learning config
with open("configs/xgboost_adversarial_config.yml", 'r') as file:
    inference_config = yaml.safe_load(file)
print(yaml.dump(inference_config, sort_keys=False))

Running FL XGBoost without adversarial fortification#

First, we will run FL without any adversarial fortification, to see how the model performs when it faces problematic clients.

As adversarial fortification is implemented by default, we need to disable it by updating the FL strategy.

This is done by setting the adv_fort section of the strategy as below

[ ]:

import logging
import os

os.environ['OCTAIPIPE_DEBUG'] = 'true'
logging.basicConfig(level=logging.INFO, format='%(message)s')

# For more verbose logs, uncomment the following line
# logging.basicConfig(level=logging.DEBUG, format='%(message)s')

[ ]:

from octaipipe.federated_learning.run_fl import OctaiFL

federated_learning_config = 'configs/xgboost_adversarial_config.yml'

octaifl = OctaiFL(
    federated_learning_config,
    deployment_name='FL XGBoost Adv Fort',
    deployment_description='FL XGBoost tutorial on Garment dataset without adversarial fortification implementation'
    )

strategy = {
    'min_available_clients': 4,
    'min_fit_clients': 3,
    'min_evaluate_clients': 3,
    'num_rounds': 5,
    'num_local_rounds': 5,
    'adv_fort': {
        'gain_factor': False,
        'eta': False,
        'config_check': False
    }
}

octaifl.strategy.set_strategy_values(strategy)
octaifl.strategy.get_strategy()

We can now run OctaiFL for XGBoost with no adversarial fortification

[ ]:

octaifl.run()

Adding Adversarial Fortification#

This wouldn’t be a very good tutorial on adversarial fortification if we didn’t implement any adversarial fortification though.

Therefore, we will re-run FL from the previous run, but add back in the adversarial fortification

[ ]:

from octaipipe.federated_learning.run_fl import OctaiFL

federated_learning_config = 'configs/xgboost_adversarial_config.yml'

octaifl = OctaiFL(
    federated_learning_config,
    deployment_name='FL XGBoost Adv Fort',
    deployment_description='FL XGBoost tutorial on Garment dataset with adversarial fortification implementation'
    )

strategy = {
    'min_available_clients': 4,
    'min_fit_clients': 3,
    'min_evaluate_clients': 3,
    'num_rounds': 5,
    'num_local_rounds': 5,
    'adv_fort': {
        'gain_factor': 1,
        'eta': True,
        'config_check': True
    }
}

octaifl.strategy.set_strategy_values(strategy)
octaifl.strategy.get_strategy()

And again, we run OctaiFL for XGBoost, but this time woth the adversarial fortification

[ ]:

octaifl.run()