Model Evaluation Step#

To run locally: model_evaluation

This step evaluates the predictions made by a model. It first gets the model id through the configuration file to instantiate and load the model object. Then, it loads the evaluation technique to evaluate the model’s performance on the test data. The evaluation results are outputted. The user can specify whether to trigger an inference container build at the end of the step.

Three Evaluation techniques are available:

RULEvaluator#

In config: rul

A class to evaluate remaining useful life estimation. The metrics used for evaluation are those used in RUL literature such that comparisons can be made. In fact, it takes a boolean argument to specify whether the evaluation metrics should be computed on all the predictions or just the final prediction. For some sets, such as the NASA turbofan dataset, the evaluation is made on the final prediction of the test set.

The available metrics for this Evaluator are :

RMSE : The root mean squared error for a set of predictions.
Score : An asymmetric function that favours early predictions of RUL over late predictions.

There is also the option to plot a graph of the actual vs predicted. Note that matplotlib is necessary for the plot to work, which is not by default when installing Octaipipe.

The eval_specs for rul look like the following:

eval_specs:
   technique: rul
   eval_final_only: True
   plot_pred: False

Classification Evaluator#

In config: classification

The classification evaluator is used to evaluate classifiers, for example logistic regression or a support vector classifier. The eval_specs for classification look like the following:

 eval_specs:
     technique: classification
     metric: balanced_accuracy # see options below
     params:
       adjusted: true

The available metrics (for metric field) for this Evaluator are:

accuracy: Proportion of correct predictions sklearn documentation
balanced_accuracy: Accuracy on imbalanced dataset, defined as average of recall on each class sklearn documentation
recall: Ratio TP / (TP + FN) sklearn documentation
precision: Ratio TP / (TP + FP) sklearn documentation
f1_score: Harmonic mean of recall and precision sklearn documentation
fbeta: Weighted harmonic mean of recall and precision sklearn documentation
roc_auc: Area Under the Receiver Operating Characteristic Curve sklearn documentation

The params field is for any additional arguments to be given to the metric function along with y_true and y_pred. For example balanced_accuracy can take the argument adjusted in which params would look something like the config above.

Anomaly Detection Evaluation#

In config: anomaly_detection

TBD

The following is an example of a config file together with descriptions of its parts.

Step config example#

name: model_evaluation

input_data_specs:
  default:
  - datastore_type: influxdb
    settings:
      query_type: dataframe
      query_template_path: ./configs/data/influx_query.txt
      query_config:
        start: "2020-05-20T13:30:00.000Z"
        stop: "2020-05-20T13:35:00.000Z"
        bucket: sensors-raw
        measurement: cat
        tags: {}

output_data_specs:
  default:
  - datastore_type: influxdb
    settings:
      bucket: test-bucket-1
      measurement: testv1

model_specs:
  name: model_eval_test0
  type: ridge_reg
  version: "1.0"

eval_specs:
  technique: rul # rul, anomaly_detection, classification.
  eval_final_only: True
  plot_pred: False # option to make a plot of actual vs predicted (rul only)

run_specs:
  save_results: false
  target_label: accel_x
  deployment_container:
    build: false
    gh_user: my-gh-user
    repo: my-gh-repo
    workflow_id: 01234567
  onnx_pred: false

Input and Output Data Specs#

input_data_specs and output_data_specs follow a standard format for all the pipeline steps; see Octaipipe Steps.

Model Specs#

This section specifies the model to be evaluated by evaluation data.

Includes the Identifier, name and model_load_specs of the trained model. The model_load_specs specifies the name of the model file as well as the name of the container in blob storage.

Eval Specs#

 eval_specs:
     technique: RUL # RUL, anomaly detection etc.
     eval_final_only: True

technique: specifies the technique to be used to evaluate the model’s performance. The current release provides two options as stated above: RUL, anomaly detection.
eval_final_only: whether the evaluation metrics should be computed on all the predictions or just the final prediction. For some datasets, such as the NASA turbofan dataset, the evaluation is made on the final prediction of the test set.

Run Specs#

 run_specs:
     save_results: True
     target_label: EDDY_BOTTOM_PSD1
     deployment_container:
         build: True
         gh_user: my-gh-user
         repo: my-gh-repo
         workflow_id: 01234567

Specifies the needed configuration of the model run.

`target_label`#

Name of the target variable: column is removed from the input data to form the output set for supervised learning. Not required for unsupervised learning such as clustering.

`deployment_container`#

Specifications of the deployment container build on GitHub.

build: boolean of whether to trigger a container build.
gh_user: GitHub username.
repo: GitHub repository.
workflow_id: workflow id of GitHub action.