Tutorial - Using MLOps policies#

Policy How-to#

Policies are the foundation of OctaiPipe’s MLOps functionality, defining how tasks like monitoring, retraining, and more are executed. With the ability to define multiple policies within a deployment, OctaiPipe provides maximum flexibility. This tutorial introduces the concept of policies and demonstrates their usage by way of an example.

Generally speaking, there are three types of policy. We categorise them by where the resulting action takes place.

Global Policy: A Global Policy refers to policies that may affect many devices or cloud components i.e. are applied globally across the device fleet. Global retraining is a typical example of a Global Policy. It monitors the drift on edge devices from the cloud deployed using Octaipipe. Retraining on the edge devices starts when the trigger rule is valid.
Cloud Policy: As the name suggests, this policy type is not applied to edge devices. For example, one may create a monitoring process to watch the health condition of the Kubernetes cluster and send a notification when certain conditions are met.
Local Policy: A Local policy is one that acts on a singular edge device. Usually, it is triggered by a monitoring job on the device however this is not always the case.

A policy is comprises of three components. These are:

Observation: An Observation is a piece of monitored data or a metric that reflects a specific condition or state within the system. They are the building blocks of a policy, the raw information on which the policy is based. In OctaiPipe, the following observations are currently provided:
- global_drift_percentage_observation: Indicates the percentage of drift detected across the global dataset or model, used to assess whether significant changes have occurred that might impact model performance.
- p_value_edge_observation: Provides the statistical p-value of data on edge devices, used to determine the significance of observed changes or patterns locally on the device.
trigger: A trigger defines the criteria for executing specified actions. Triggers are composed using Python syntax i.e. Python assignment, arithmetic, logic, or bitwise operators, alongside Observations. A valid trigger expression must evaluate to a boolean result and be valid python syntax. Examples of Trigger Expressions:
- Always True Triggers: These expressions always evaluate to True and will continually execute the associated actions.
  - True
  - True or False
  - 1
  - 2*4
- Condition-based Triggers: These use Observations or specific logic to trigger actions when certain conditions are met.
  - global_drift_percentage_observation > 0.03: This triggers the action(s) if the global_drift_percentage_observation exceeds 0.03.
  - p_value_edge_observation <= 0.05: Triggers when the p_value_edge_observation is below or equal to 0.05.
- Complex Logic: Combine multiple conditions using Python’s logical and bitwise operators.
  - (global_drift_percentage_observation > 0.03) and (p_value_edge_observation <= 0.05)
  - global_drift_percentage_observation > 0.03 | p_value_edge_observation <= 0.05 (bitwise OR)
  - global_drift_percentage_observation^p_value_edge_observation > p_value_edge_observation

Triggers are highly customizable and allow for both simple and advanced conditions.

Sctions: Actions define the tasks to be executed when the associated trigger evaluates to True. Multiple actions can be specified, and they are executed sequentially in the order they are defined. Certain actions may require additional arguments to function properly. OctaiPipe provides the following built-in actions for use:
- global_retrain: Performs global retraining of the model.
- local_retrain: Performs local retraining on an edge device.
- send_notification: Sends a notification with customisable content.

Here is an example of a full policy definition in a deployment config:

policies:
  - name: policy1
    observations:
        - global_drift_percentage_observation
    trigger: global_drift_percentage_observation>0.03
    actions:
        - global_retrain:
            global_retrain_config_path: ./configs/federated_learning_run.yml
        - notification:
            subject: 'Global Drift'
            message: 'Global Drift detected, retrain performed and latest model available'
            status: 'Success'

Running a global policy#

In this section, we will walk through the complete process of creating, deploying and tearing down a global policy. We will use the built in Octaipipe global policy action (Global Retraining) as an example. A Global Retraining policy can be broken down into the constituent parts below:

Observations: proportion of devices that have experienced drift since last retraining
Trigger: proportion of devices that sufferred drift
Actions: global retraining and notification

Add a policy to a deployment#

You will see from this deployment config which defines a policy for a model inference deployment, that a policy is defined in a regular OctaiPipe deployment config. If you have a working model_inference deployment from a previous tutorial, you can use that, otherwise follow the steps in the “End-to-end deployment tutorial”

Deploy a policy#

Once you have a deployment config, you can deploy the step using deploy_to_edge with the configuration file you have created and appended your policy to:

[ ]:

from octaipipe.deployment import deploy_to_edge

deployment_id = deploy_to_edge('./policy_deployment.yml')

print(f'Deployment Id is : {deployment_id}')

Tear down a policy#

A policy is torn down alongside its deployment. It cannot be torn down alone. To down the dpeloyment run the cell below:

[ ]:

from octaipipe.deployment import down_deployment

down_deployment(deployment_id=deployment_id)