Model Evaluation Step#
To run locally: model_evaluation
This step evaluates the predictions made by a model. It first gets the model id through the configuration file to instantiate and load the model object. Then, it loads the evaluation technique to evaluate the model’s performance on the test data. The evaluation results are outputted. The user can specify whether to trigger an inference container build at the end of the step.
Three Evaluation techniques are available:
RULEvaluator#
In config: rul
A class to evaluate remaining useful life estimation. The metrics used for evaluation are those used in RUL literature such that comparisons can be made. In fact, it takes a boolean argument to specify whether the evaluation metrics should be computed on all the predictions or just the final prediction. For some sets, such as the NASA turbofan dataset, the evaluation is made on the final prediction of the test set.
The available metrics for this Evaluator are :
RMSE : The root mean squared error for a set of predictions.
Score : An asymmetric function that favours early predictions of RUL over late predictions.
There is also the option to plot a graph of the actual vs predicted. Note that matplotlib is necessary for the plot to work, which is not by default when installing Octaipipe.
The eval_specs
for rul look like the following:
0eval_specs:
1 technique: rul
2 eval_final_only: True
3 plot_pred: False
Classification Evaluator#
In config: classification
The classification evaluator is used to evaluate classifiers, for example logistic
regression or a support vector classifier. The eval_specs
for classification
look like the following:
0 eval_specs:
1 technique: classification
2 metric: balanced_accuracy # see options below
3 params:
4 adjusted: true
The available metrics (for metric
field) for this Evaluator are:
accuracy: Proportion of correct predictions sklearn documentation
balanced_accuracy: Accuracy on imbalanced dataset, defined as average of recall on each class sklearn documentation
recall: Ratio TP / (TP + FN) sklearn documentation
precision: Ratio TP / (TP + FP) sklearn documentation
f1_score: Harmonic mean of recall and precision sklearn documentation
fbeta: Weighted harmonic mean of recall and precision sklearn documentation
roc_auc: Area Under the Receiver Operating Characteristic Curve sklearn documentation
The params
field is for any additional arguments to be given to the metric function
along with y_true and y_pred. For example balanced_accuracy can take the argument
adjusted in which params
would look something like the config above.
Anomaly Detection Evaluation#
In config: anomaly_detection
TBD
The following is an example of a config file together with descriptions of its parts.
Step config example#
1name: model_evaluation
2
3input_data_specs:
4 datastore_type: influxdb
5 query_type: dataframe
6 query_template_path: ./configs/data/influx_query.txt
7 query_values:
8 start: "2020-05-20T13:30:00.000Z"
9 stop: "2020-05-20T13:35:00.000Z"
10 bucket: sensors-raw
11 measurement: cat
12 tags: {}
13 data_converter: {}
14
15
16output_data_specs:
17 - datastore_type: influxdb
18 settings:
19 bucket: test-bucket-1
20 measurement: testv1
21
22
23model_specs:
24 name: model_eval_test0
25 type: ridge_reg
26 version: "1.0"
27
28eval_specs:
29 technique: rul # rul, anomaly_detection, classification.
30 eval_final_only: True
31 plot_pred: False # option to make a plot of actual vs predicted (rul only)
32
33run_specs:
34 save_results: false
35 target_label: accel_x
36 deployment_container:
37 build: false
38 gh_user: my-gh-user
39 repo: my-gh-repo
40 workflow_id: 01234567
41 onnx_pred: false
Input and Output Data Specs#
input_data_specs
and output_data_specs
follow a standard format for all the pipeline
steps; see Octaipipe Steps.
Model Specs#
This section specifies the model to be evaluated by evaluation data.
Includes the Identifier, name and model_load_specs of the trained model. The model_load_specs
specifies the
name of the model file as well as the name of the container in blob storage.
Eval Specs#
30 eval_specs:
31 technique: RUL # RUL, anomaly detection etc.
32 eval_final_only: True
technique
: specifies the technique to be used to evaluate the model’s performance. The current release provides two options as stated above:RUL
,anomaly detection
.eval_final_only
: whether the evaluation metrics should be computed on all the predictions or just the final prediction. For some datasets, such as the NASA turbofan dataset, the evaluation is made on the final prediction of the test set.
Run Specs#
34 run_specs:
35 save_results: True
36 target_label: EDDY_BOTTOM_PSD1
37 deployment_container:
38 build: True
39 gh_user: my-gh-user
40 repo: my-gh-repo
41 workflow_id: 01234567
Specifies the needed configuration of the model run.
save_results
#
If set to True
, the trained model will be saved to a Azure Storage Account.
target_label
#
Name of the target variable: column is removed from the input data to form the output set for supervised learning. Not required for unsupervised learning such as clustering.
deployment_container
#
Specifications of the deployment container build on GitHub.
build
: boolean of whether to trigger a container build.gh_user
: GitHub username.repo
: GitHub repository.workflow_id
: workflow id of GitHub action.