Model Evaluation Step#

To run locally: model_evaluation

This step evaluates the predictions made by a model. It first gets the model id through the configuration file to instantiate and load the model object. Then, it loads the evaluation technique to evaluate the model’s performance on the test data. The evaluation results are outputted. The user can specify whether to trigger an inference container build at the end of the step.

Three Evaluation techniques are available:

RULEvaluator#

In config: rul

A class to evaluate remaining useful life estimation. The metrics used for evaluation are those used in RUL literature such that comparisons can be made. In fact, it takes a boolean argument to specify whether the evaluation metrics should be computed on all the predictions or just the final prediction. For some sets, such as the NASA turbofan dataset, the evaluation is made on the final prediction of the test set.

The available metrics for this Evaluator are :

  • RMSE : The root mean squared error for a set of predictions.

  • Score : An asymmetric function that favours early predictions of RUL over late predictions.

There is also the option to plot a graph of the actual vs predicted. Note that matplotlib is necessary for the plot to work, which is not by default when installing Octaipipe.

The eval_specs for rul look like the following:

0eval_specs:
1   technique: rul
2   eval_final_only: True
3   plot_pred: False

Classification Evaluator#

In config: classification

The classification evaluator is used to evaluate classifiers, for example logistic regression or a support vector classifier. The eval_specs for classification look like the following:

0 eval_specs:
1     technique: classification
2     metric: balanced_accuracy # see options below
3     params:
4       adjusted: true

The available metrics (for metric field) for this Evaluator are:

The params field is for any additional arguments to be given to the metric function along with y_true and y_pred. For example balanced_accuracy can take the argument adjusted in which params would look something like the config above.

Anomaly Detection Evaluation#

In config: anomaly_detection

TBD

The following is an example of a config file together with descriptions of its parts.

Step config example#

 1name: model_evaluation
 2
 3input_data_specs:
 4  datastore_type: influxdb
 5  query_type: dataframe
 6  query_template_path: ./configs/data/influx_query.txt
 7  query_values:
 8    start: "2020-05-20T13:30:00.000Z"
 9    stop: "2020-05-20T13:35:00.000Z"
10    bucket: sensors-raw
11    measurement: cat
12    tags: {}
13  data_converter: {}
14
15
16output_data_specs:
17  - datastore_type: influxdb
18    settings:
19      bucket: test-bucket-1
20      measurement: testv1
21
22
23model_specs:
24  name: model_eval_test0
25  type: ridge_reg
26  version: "1.0"
27
28eval_specs:
29  technique: rul # rul, anomaly_detection, classification.
30  eval_final_only: True
31  plot_pred: False # option to make a plot of actual vs predicted (rul only)
32
33run_specs:
34  save_results: false
35  target_label: accel_x
36  deployment_container:
37    build: false
38    gh_user: my-gh-user
39    repo: my-gh-repo
40    workflow_id: 01234567
41  onnx_pred: false

Input and Output Data Specs#

input_data_specs and output_data_specs follow a standard format for all the pipeline steps; see Octaipipe Steps.

Model Specs#

This section specifies the model to be evaluated by evaluation data.

Includes the Identifier, name and model_load_specs of the trained model. The model_load_specs specifies the name of the model file as well as the name of the container in blob storage.

Eval Specs#

30 eval_specs:
31     technique: RUL # RUL, anomaly detection etc.
32     eval_final_only: True
  • technique: specifies the technique to be used to evaluate the model’s performance. The current release provides two options as stated above: RUL, anomaly detection.

  • eval_final_only: whether the evaluation metrics should be computed on all the predictions or just the final prediction. For some datasets, such as the NASA turbofan dataset, the evaluation is made on the final prediction of the test set.

Run Specs#

34 run_specs:
35     save_results: True
36     target_label: EDDY_BOTTOM_PSD1
37     deployment_container:
38         build: True
39         gh_user: my-gh-user
40         repo: my-gh-repo
41         workflow_id: 01234567

Specifies the needed configuration of the model run.

save_results#

If set to True, the trained model will be saved to a Azure Storage Account.

target_label#

Name of the target variable: column is removed from the input data to form the output set for supervised learning. Not required for unsupervised learning such as clustering.

deployment_container#

Specifications of the deployment container build on GitHub.

  • build: boolean of whether to trigger a container build.

  • gh_user: GitHub username.

  • repo: GitHub repository.

  • workflow_id: workflow id of GitHub action.