OctaiKube#

This is a library that would serve as an interface between OctaiPipe library and the Kubeflow service. This document describes the different modules that the library provides to interact with Octaipipe ML pipelines package.

Some useful guides:

Ad-Hoc Code running in KubeFlow
- Modifying function to run in Kubeflow

OctaiKube Interface Functions#

class octaikube.get_available_steps#

Returns OctaiPipe steps available to use in your workflow

Returns:: list of step names which can be used to identify steps in all related routines
Return type:: list

Example

>>> import octaikube as ock
>>> ock.get_available_steps()
['preprocessing', 'feature_engineering', 'model_training']

class octaikube.get_def_step_config(step_name: str, folder='configs/step_defs')#

Initialises a config file for one of the OctaiPipe steps. The config file being written is not usable, i.e. it is not an example, but a definition of the config, specifying fields and expected types for values

Parameters:

step_name (str) – OctaiPipe step that you would like to get the config
for (definition) –
folder (str, optional) – folder where config definition should
'./configs'. (be saved. Defaults to) –

Returns:

confirmation of the success with the path to the config

Return type:

str

class octaikube.get_example_step_config(step_name: str, folder='configs/step_configs')#

Write out an example of a config file for a given step

Parameters:

step_name (str) – name of octaipipe step
folder (str, optional) – path to folder where to save the config to.
'configs'. (Defaults to) –

Returns:

path to the written config

Return type:

str

class octaikube.get_component(step_name: str)#

Load kfp component definition from the package data. If no direct match can be found in native pipeline components, check the custom pipeline step database to see if step_name can be found there. If custom pipeline step is found, use base component, else throws error as this will not run in Kubeflow successfully anyway.

Parameters:: step_name (str) – name of an octaipipe step
Returns:: dict representation of the component definition (to be used with kfp.component.load_component_from_text after converting to str)
Return type:: comp

OctaiKube Adhoc Functions#

class octaikube.adhoc.run_in_kubeflow(experiment_name)#

Decorator function to run a function in kubeflow. For details on how to use this function please see Ad-Hoc Code running in KubeFlow

Parameters:: experiment_name (str) – name of the kubeflow experiment to attach the run to. Note that if the experiment under the given name doesn’t exist, a new experiment will be created.

OctaiKube Utils Functions#

class octaikube.utils.run_pipeline(name: str, exp_name: str, steps: list, image_name: Optional[str] = None, env_vars: dict = {}, automl: bool = False)#

Put together several pipeline steps to run in kubeflow

Parameters:

name (str) – name of the pipeline
exp_name (str) – name of the experiment in kubeflow
steps (list) – list of tuples of the form (kpf component, config path)
image_name (str, optional) – image to use for training. Usually like: stable<client-name>octaipipe.azurecr.io/master:latest
env_vars (dict, optional) – key-values of environment variables to set for the containers, e.g. {INFLUX_ORG: ‘my_org’}
automl (bool, optional) – Whether pipeline being built is autoML pipeline or not. This affects whether _get_pipeline() or _get_automl_pipeline() is run. Defaults to False.

Returns:

kfp run

Return type:

object

class octaikube.utils.create_pipeline_component(step_name: str, env_vars: dict, image_name: str)#

Generate the kfp component. The definition of the component is loaded from the pakage data.

Parameters:

step_name (str) – name for which there is an available component in
data (the package) –
env_vars (dict) – Environment variables to set for containers as dict to add to component string
image_name (str, optional) – image to use for training. Usually like: stable<client-name>octaipipe.azurecr.io/master:latest

Returns:

kfp factory function of the component

Return type:

function

class octaikube.utils.get_required_modules(func)#

Parses the function source code to find all the modules that are imported inside the function, then compares them with the list of python standard library modules and returns a list of modules that are not part of the standard library. Supports import patterns like:

import numpy as np
import pandas
import matplotlib.pyplot as plt
from octaipipe import pipeline

Parameters:: func (function) – function object to be parsed
Returns:: modules names that were imported
Return type:: list of strings

class octaikube.utils.upload_config_to_blob(step_name: str, config_path: str)#

For running on Kubeflow, we need the config to be somewhere on the cloud, because we cannot add it to the container. So we upload it to the blob and return the path that is understood by octaistep

Parameters:

step_name (str) – name of an octaipipe step
config_path (str) – path to the config that will be used by the step

Returns:

string of the form ‘blob {container: <str>, :file_name: <str>}’ which is recognised by octaistep

Return type:

str

OctaiKube AutoML Functions#

class octaikube.automl.run_automl(config_path: str, name: str, exp_name: str, env_vars: dict = {}, image_name: Optional[str] = None)#

Function to run AutoML pipeline in Kubeflow. Makes use of the existing OctaiKube run_pipeline and get_output functions. Returns best model and information on how to retrieve it.

Parameters:

config_path (str) – Path to configs for the auto ML run. Example config can be gotten using octaikube.get_example_step_config(‘automl’).
name (str) – name of the pipeline
exp_name (str) – name of the experiment in kubeflow
env_vars (dict) – key-values of environment variables to set for the containers, e.g. {INFLUX_ORG: ‘my_org’}
image_name (str, optional) – image to use for training. Usually like: stable<client-name>octaipipe.azurecr.io/master:1.2.2

Returns:

Tuple consisting of three things. The output from: run_pipeline function to get outputs, as well as the config_path and exp_name to update the autoML database.

Return type:

output (tuple)

class octaikube.automl.get_automl_outputs(automl_run_output: tuple)#

Gets relevant output for the AutoML pipeline using the OctaiKube functions. If outputs are not available, prints timeout message from get_outputs function and returns None.

Parameters:: automl_run_output (tuple) – Output from run_pipeline function to get output and register autoML run.
Returns:: Model, location and metric from auto ML run.
Return type:: automl_output (tuple)

class octaikube.automl.get_automl_runs(model_id: Optional[Union[str, list]] = None, model_name: Optional[Union[str, list]] = None, exp_name: Optional[Union[str, list]] = None, this_namespace: bool = True)#

This function gets the previous autoML runs from the database. To return data for a specific model_id or multiple model_ids, the model_id argument can be a single ID or a list of IDs. The same applies for experiments and model names.

If two arguments are provided, the intersection of these will be returned. For example, if a specific model name and experiment name are provided, only models with this name and experiment will be returned. Subsetting on model ID will not subset on any other input as model IDs are unique across all other fields.

Parameters:

model_id (str or list) – The model ID or list of IDs to return records for. Defaults to None, which returns all records.
model_name (str or list) – The model name or list of names to return records for. Defaults to None, which returns all records.
exp_name (str) – Which Kubeflow experiment name to return records for. If list, returns records for all experiments in list. Defaults to None, which returns all records.
this_namespace (bool) – Whether to get models for current namespace only. If True, it uses NAMESPACE environment variable. Defaults to True.

Returns:

data frame with records in database

Return type:

automl_records (pd.DataFrame)