Tutorial - Running FL XGBoost with OctaiPipe#

In this tutorial I will take you through a standard Federated Learning deployment using XGBoost tree models. The data used is a small subset of the higgs dataset which can be downloaded here. The data in this tutorial will be static (i.e. will not update throughout training).

We will go through the following steps:

Download and split data
Write data to devices
Introduce problematic data

Step 1 - Download and split data#

Here the higgs dataset is split into n chunks where n is the number of target devices.

[3]:

# Update this with a list of registered devices to use in this tutorial
devices: list[str] = ['device_0', 'device_1', 'device_2', 'device_3']

[ ]:

# Download the dataset, split it and send to the devices
import pandas as pd
import numpy as np

assert len(devices), "Please update the devices variable with the list of devices you'll be running on"
! mkdir -p configs/datasets/devices/
! mkdir -p /tmp/datasets/
! wget -c https://octaipipe.blob.core.windows.net/higgs-dataset/higgs_data.tar.gz -P /tmp/datasets
! tar -xvf /tmp/datasets/higgs_data.tar.gz -C /tmp/datasets

# Load and randomly sample 20% of the rows
train_data = pd.read_csv('/tmp/datasets/higgs_train_data.csv').sample(frac=0.3, random_state=42)
test_data = pd.read_csv('/tmp/datasets/higgs_test_data.csv').sample(frac=0.3, random_state=42)

print('\nSplitting data into chunks for each device...')
train_chunks = np.array_split(train_data, len(devices))
test_chunks = np.array_split(test_data, len(devices))

# write all chunks to ./datasets/devices
for i in range(len(devices)):
    train_chunks[i].to_csv(f'configs/datasets/devices/train_data_{i}.csv', index=False)
    test_chunks[i].to_csv(f'configs/datasets/devices/test_data_{i}.csv', index=False)

print('Data has been downloaded and split into chunks for each device.\nContents of ./datatsets/devices/:')
! ls -ltr configs/datasets/devices/

Step 2 - Write data to devices#

The following uses the OctaiPipe data_writing step to load the data split above to the registered devices.

The python code can be found in ./write_data.py it ensures the split data is present, completes the config templates with details about the relevant devices and file paths and starts two data_writing steps, one for writing test and one for train, on each device.

The train and test data will be sent to each of the devices at the paths /tmp/higgs_train_data.csv and /tmp/higgs_test_data.csv respectively. Check the output of the cells to enure there are no issues downloading and sending the datasets. If possible, check the devices themselves for the presence of the test and train CSVs

[ ]:

from write_data import write_data_to_devices
write_data_to_devices(devices)

Step 3 - Running FL#

Now the devices are ready to take part in an FL experiment.

NOTE: before you run make sure that the devices section of the config file is updated to be a list of the device Ids you want to take part in the experiment.#

There is an example_xgboost_federated_learning.yml file in configs/ to familiarise yourself with how the federated learning config looks.
There is also a xgboost_federated_learning.yml which will be used to run the experiment. Feel free to play with this file, try out different configurations of input/output specs, model params, and strategies

[ ]:

import yaml
# Display federated learning config
with open("configs/example_xgboost_federated_learning.yml", 'r') as file:
    inference_config = yaml.safe_load(file)
print(yaml.dump(inference_config, sort_keys=False))

[ ]:

import logging
import os

os.environ['OCTAIPIPE_DEBUG'] = 'true'
logging.basicConfig(level=logging.INFO, format='%(message)s')

# For more verbose logs, uncomment the following line
# logging.basicConfig(level=logging.DEBUG, format='%(message)s')

Set up the octaifl context by passing the config file, name and escription to OctaiFl

[3]:

from octaipipe.federated_learning.run_fl import OctaiFL

federated_learning_config = 'configs/xgboost_federated_learning.yml'

octaifl = OctaiFL(
    federated_learning_config,
    deployment_name='FL XGBoost tutorial deployment',
    deployment_description='Deployment part of FL XGBoost tutorial'
    )

You can check the current strategy by running the cell below and update it based on your requirements. (see the docs for more on strategy settings.

[ ]:

octaifl.strategy.get_strategy()

FL XGBoost in OctaiPipe is able to perform multiple training rounds on the device before model aggregation. In order to set this, I will set the num_local_rounds option in the strategy to 2.

If I wanted to reduce the impact of imbalanced dataset sizes on the devices I would set the normalized_learning_rate to True in the same way. In this tutorial, the datsets sent to devices are all the same size.

[ ]:

strategy = {
    'num_rounds': 20, "num_local_rounds": 2
}

octaifl.strategy.set_strategy_values(strategy)
octaifl.strategy.get_strategy()

Finally we can run the experiment

[ ]:

octaifl.run()

Checking the processes#

There are now two processes running, the server and the clients.

To dig depeer you can explore the fl server kubernetes deployment by finding its details via kubectl -n colab get all

You can also log into the device to get the client deployment logs by running docker logs -f <id of container>