Differential Privacy (DP) Implementation with OctaiPipe#

Introduction#

Federated Learning preserves privacy by training models without moving raw data from edge devices. However, the model parameters sent by these devices can reveal patterns in the underlying data that could allow attackers to infer sensitive information (for example by using attribute and property inference privacy attacks). Differential privacy adds selected noise to these updates, protecting individual contributions.

Compared to methods like anonymization or cryptographic ciphers that obfuscate the original source, Differential privacy (DP) shifts the focus from ‘how to anonymize data’ to ‘how to measure the loss of privacy when the data is released’. By adding selected noise, DP provides an unconditional upper bound on the influence of a single individual on the output of the algorithm.

OctaiPipe has implemented Differential Privacy (DP) as a method of protecting sensitive data during federated training of neural network models.

Privacy budget#

The most important parameter in DP is the privacy budget (epsilon) which measures privacy loss. It also controls the privacy-utility/performance trade-off; lower values of epsilon indicate higher levels of privacy but are likely to reduce utility/performance as well. If epsilon is close to or lower than 1 (achieved adding more noise), the privacy loss remains small even in worst-case scenarios. Values of epsilon between 2 to 10 are not optimal, but they are considered a better solution than having no DP as they still offer some privacy protection. Values above 10 are similar to sharing the exact data, potentially exposing sensitive information.

Types of Differential privacy#

The use of DP in training Neural networks (NN) is well researched and can be divided into two categories :

-Central/Global Differential Privacy : The server aggregates model weights sent by clients after clipping (described below) and then adds noise. Privacy depends on trusting that the server has not been compromised to expose client data. Model performance is very high while privacy is relatively low.

-Local Differential Privacy : The clients add noise to local model weights before sending them to the server. Model utility is lower but privacy is higher when compared to global DP.

OctaiPipe supports Global or Local DP during federated training of neural networks. Users decide between Global or Local and configure OctaiPipe accordingly.

Running FL with DP in OctaiPipe#

The snippet below specifies the code needed to run Local DP during FL with OctaiPipe. This uses the file ‘configs/config_localDP.yml’ in the config_path. Details of the file are explained in the next section.

from octaipipe.federated_learning.run_fl import OctaiFL
config_path = './configs/config_localDP_flat.yml'

octaifl = OctaiFL(
    config_path,
    deployment_name='test weights',
    deployment_description='Testing DP with tabular dataset'
)
strategy = {
    'experiment_description': 'Testing DP',
    'strategy_name': 'fed_avg',
}
octaifl.strategy.set_strategy_values(strategy)
octaifl.run()

Note that during Local DP, strategy at the server should be set to federated average (fed_avg) for global model convergence.

For Global DP, the strategy needs to be changed to ‘global_dp’ as below :

strategy = {
    'experiment_description': 'Testing DP',
    'strategy_name': 'global_dp',
    'sigma': 1.0,
}

The parameter ‘sigma’ is the amount of noise the server adds as part of Global DP.

OctaiPipe calculates the resulting epsilon value to measure the privacy budget during training. As explained earlier, the epsilon value remain between 0.75 and 4 for good privacy protection with moderate loss of model utility. Note that the relationship between sigma and epsilon depends on the data distribution and model complexity.

Configuring DP#

During Local DP, each client carries out two primary operations: clipping the model weights and adding noise. During Global DP, the client only performs model weight clipping.

There are two ways to configure clipping :

-Fixed Clipping : A fixed threshold is set for the magnitude of client updates and any update exceeding this threshold is clipped back to the threshold value.

-Adaptive Clipping : The clipping threshold dynamically adjusts based on the observed update distribution.

Note that the selection of fixed or adaptive clipping requires advance knowledge of privacy requirements, data distribution and model complexity.

Below is an example of how to set the configuration for local DP with fixed clipping.

model_specs:
    type: base_torch
    name: demo_torch_model
    model_params:
        loss_fn: mse
        scaling:
        metric: rmse
        epochs: 10
        batch_size: 32
        differential_privacy:
            type: 'flat'
            settings:
                noise_multiplier: 5.0
                max_grad_norm: 1.20

Note that noise_multiplier is equivalent to sigma as used for Global DP. This is the amount of noise to ingest into the system. The parameter ‘max_grad_norm’ sets the threshold value of the model weights.

Suggested parameter value range [note that this strongly depends on the model and datasets] :

noise_multiplier : 2.5 to 15
max_grad_norm: 1.0 to 5.0

For Global DP with fixed clipping, the noise_multiplier should be set to 0.0

For adaptive clipping, the differential_privacy parameters should be changed to the following:

differential_privacy:
    type: 'adapt'
    settings:
        noise_multiplier: 5.0
        target_unclipped_quantile: 1.0
        clipbound_learning_rate: 0.01
        max_clipbound: 10
        min_clipbound: 0.1
        unclipped_num_std: 10
        max_grad_norm: 25.0
        loss_reduction: "mean"

The most important parameters are noise_multiplier (the same as with fixed clipping) and max_grad_norm (maximum threshold given but the optimizer will choose the threshold value by looking at the data distribution and other parameters).

There are two constraints for setting these parameters: (1) the max_clipbound must be larger than min_clipbound, and (2) noise_multiplier needs to be less than 2*unclipped_num_std.

The recomended ranges for tuning the main parameters are as follows:

Suggested parameter values (note that these strongly depend on the model and datasets) :

Loss_reduction : either “mean” or “sum“.
Clipbound_learning_rate : should be kept at rather low value (0.01 - 0.05).
max_grad_norm : [5, 25.0]
noise_multiplier : 2.5 to 15

For Global DP with adaptive clipping, noise_multiplier needs to be set to 0.0001 or similar small number.