Feature Selection Step#

To run locally: feature_selection

This feature selection step loads raw data that has been imputed, computes fitness metrics for all features in the raw data, and returns a subset of data (including list of fit features) for training.

Note: This step will be generalised and made available in a future release. Until it is done, this page is not updated.

The following is an example of a config file together with descriptions of its parts.

Step config example#

  • Alternatively, you can load an example of the filled config where you will need to change some values to adapt it to your problem.

 1name: feature_selection
 2
 3input_data_specs:
 4  default:
 5  - datastore_type: influxdb
 6    settings:
 7      query_type: influx  # influx/dataframe/stream/stream_dataframe/csv
 8      query_template_path: ./configs/data/influx_query.txt
 9      query_config:
10        start: "2020-05-20T13:30:00.000Z"
11        stop: "2020-05-20T13:35:00.000Z"
12        bucket: sensors-out
13        measurement: cat 
14        tags: {}
15      data_converter: 
16        name: influx_flat
17        args: {}
18
19output_data_specs:
20  default:
21  - datastore_type: influxdb
22    settings:
23      bucket: test-bucket-1
24      measurement: testv1-fe
25
26run_specs:
27  input_sensors:
28    - "Load_Cell_Mid"
29    - "Eddy_Top"
30  data_filename: /Users/ngcs/Downloads/211106_220102_imputed.parquet
31  max_RUL: 1000
32  min_metric_val: 0.1
33  last_n_strokes: 10000
34  cycle_key:

Input and Output Data Specs#

input_data_specs and output_data_specs follow a standard format for all the pipeline steps; see Octaipipe Steps.

Configs Description#

Use this table to describe in detail each of the fields in the configuration file provided above apart from input_data_specs and output_data_specs as they are explained in the main OctaiPipe Steps page.

Level 1

Level 2

Level 3

Type/Options

Description

run_specs

Step Outputs#

In this section, provide a description of the expected outputs of the step, both local and in the blob to help users investigate their results.