Feature Selection Step#

To run locally: feature_selection

This feature selection step loads raw data that has been imputed, computes fitness metrics for all features in the raw data, and returns a subset of data (including list of fit features) for training.

Note: This step will be generalised and made available in a future release. Until it is done, this page is not updated.

The following is an example of a config file together with descriptions of its parts.

Step config example#

Alternatively, you can load an example of the filled config where you will need to change some values to adapt it to your problem.

name: feature_selection

input_data_specs:
  default:
  - datastore_type: influxdb
    settings:
      query_type: influx  # influx/dataframe/stream/stream_dataframe/csv
      query_template_path: ./configs/data/influx_query.txt
      query_config:
        start: "2020-05-20T13:30:00.000Z"
        stop: "2020-05-20T13:35:00.000Z"
        bucket: sensors-out
        measurement: cat 
        tags: {}
      data_converter: 
        name: influx_flat
        args: {}

output_data_specs:
  default:
  - datastore_type: influxdb
    settings:
      bucket: test-bucket-1
      measurement: testv1-fe

run_specs:
  input_sensors:
    - "Load_Cell_Mid"
    - "Eddy_Top"
  data_filename: /Users/ngcs/Downloads/211106_220102_imputed.parquet
  max_RUL: 1000
  min_metric_val: 0.1
  last_n_strokes: 10000
  cycle_key:

Input and Output Data Specs#

input_data_specs and output_data_specs follow a standard format for all the pipeline steps; see Octaipipe Steps.

Configs Description#

Use this table to describe in detail each of the fields in the configuration file provided above apart from input_data_specs and output_data_specs as they are explained in the main OctaiPipe Steps page.

Level 1	Level 2	Level 3	Type/Options	Description
run_specs

Step Outputs#

In this section, provide a description of the expected outputs of the step, both local and in the blob to help users investigate their results.