Data Writing Step#

Introduction#

In order to facilitate getting data onto a device for experimentation and Tutorials, OctaiPipe offers a data writing step.

The data writing step has two main functionalities:

Get a single dataset onto a device, e.g. read sample data from a SQL table and save it in a CSV on the device
Write data in intervals to a device, simulating live data

The data is loaded from 1 of 3 sources and then written based on the output_data_specs.

Data sources#

The data writing steps has three main sources of data:

Data downloaded from a link
Data from a local file (can be uploaded through deployment)
Data from input data specs

These are checked in turn, i.e. if a link is provided, data will not be retrieved from a file or input data specs. Input data specs will only be used if link and filepath are null.

Let’s go through these three sources individually:

Downloaded from a link#

This allows data to be downloaded from a link with a CSV file, then written to the location specified by the output_data_specs.

NOTE: The link has to be for a CSV file with comma separated items. No processing of the data will be done on the device, it is assumed to be ready.

A link can be provided per device to put different data on each device, or the same data on all devices.

Read from a file#

This allows data to be read from a local file, then written to the location specified by the output_data_specs.

NOTE: This also needs to be a CSV file similar to the link

A filepath can be provided per device to put different data on each device, or the same data on all devices.

The file(s) can also be transferred down to the device using OctaiPipe’s deployment mechanism. If the file to read from is in the configs folder of the workspace, it will be in the configs folder on the device as well. So, a file in ./configs/data.csv can be specified as ./configs/data.csv in the data writing step.

From input data specs#

This simply uses input_data_specs with the same pattern as those for Data Loading and Writing Utilities.

Input data specs are not necessary but can be used for the step, if you want to for example read from a SQL database and write to influx on your device.

Configuring and running step#

Configuring the step in OctaiPipe is similar to other pipeline steps. A YAML config file is used and the step can be run on edge devices using the deploy_to_edge.

An example config file is shown below:

  name: data_writing

  # Input data specs not necessary
  input_data_specs:
    default:
    - datastore_type: influxdb
      settings:
        query_type: dataframe
        query_template_path: ./configs/influx_query.txt
        query_config:
          start: "2024-04-11T16:00:00.000Z"
          stop: "2024-04-11T16:30:00.000Z"
          bucket: test-bucket
          measurement: test-measurement
    feda-test-1:
    - datastore_type: influxdb
      settings:
        query_type: dataframe
        query_template_path: ./configs/influx_query.txt
        query_config:
          start: "2024-04-11T16:00:00.000Z"
          stop: "2024-04-11T16:30:00.000Z"
          bucket: test-bucket
          measurement: test-measurement

  output_data_specs:
    default:
    - datastore_type: influxdb
      settings:
        bucket: test-bucket
        measurement: live-model-predictions

  data_feeding_specs:
    from_link: # Link to download from
      default: 'https://link-to-csv.org/my-csv.csv'
      device-1: 'https://link-to-csv.org/my-csv-dev-1.csv'
    from_file: # Filepath to CSV
      default: './configs/my-data.csv'
      device-1: './configs/my-data-dev-1.csv'
    write_once: false # Whether to write all at once or simulate live data
    chunk_size: 10 # Number of rows to write at once
    interval: 10 # How long to sleep between writes
    index_cols: ['_time'] # columns to use as index if any
    exclude_cols: ['x_log_3'] # columns to exclude if any

  run_specs: {}

The key things from the above config are the following:

`input_data_specs`#

Where to read data from if from_link and from_file are None. Will be overwritten if link or file are provided.

`data_feeding_specs`#

Contains all configuration specific to data writing step.

`from_link`#

This is a dictionary where each key is a device ID and the value the link to download from. The key default can also be used for any device ID not listed.

This defaults to None.

`from_file`#

This is a dictionary where each key is a device ID and the value the filepath to read from. The key default can also be used for any device ID not listed.

This defaults to None.

`write_once`#

Whether to write all data at once or in intervals to mimic live data. If simulating live data, the data will be read from the top down.

This defaults to True, meaning data is written once only.

`chunk_size`#

How many rows of data to write at once if writing in intervals. Defaults to 10.

`interval`#

How often to write in seconds. Defaults to 10 seconds.

`index_cols`#

List of columns in data to use as index if using from_file or from_link.

`exclude_cols`#

List of columns in data to exclude if using from_file or from_link.

Notes on data formatting#

In general, the data needs to be in the correct format that you wish it to be saved in. For example, if you include a column in the dataset that you don not wish to save, remove this before running the step as the step does no formatting.

If you are writing to an influx database and write_once is set to True, you need the data to have a column called _time, which has the timestamps of the data. If you write continuously to influx, the time stamp is set by the data writing step.

Data Writing Step#

Introduction#

Data sources#

Downloaded from a link#

Read from a file#

From input data specs#

Configuring and running step#

input_data_specs#

data_feeding_specs#

from_link#

from_file#

write_once#

chunk_size#

interval#

index_cols#

exclude_cols#