Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add csv-import plugin to builtins #1051

Open
6 tasks
bvickers7 opened this issue Oct 17, 2024 · 3 comments
Open
6 tasks

Add csv-import plugin to builtins #1051

bvickers7 opened this issue Oct 17, 2024 · 3 comments
Labels
good-first-issue This issue is a good one for someone looking to get involved with this project/initiative.

Comments

@bvickers7
Copy link

What
A builtin plugin that imports observations from a CSV file. Each row in the CSV is mapped to an input in the manifest.

Why
Inputs can have many data points and for data sources that do not have dedicated importer plugins, a generic CSV import utility would save time by offering an alternative to manually copying in data points to the input.

Context

  • The csv-lookup builtin plugin exists. This is different from that. csv-lookup adds data to existing inputs by matching existing values in the inputs to rows in the CSV file. csv-import should generate a list of inputs, where each row is an input.
  • Some discussion around this feature has already happened in Impact Framework Project Updates2024-06-05 #780 (comment)

Prerequisites/resources
None

SoW (scope of work)

  • new directory in builtins with implementation
  • test cases added
  • README in new directory documenting plugin usage

Acceptance criteria

  • Scenario 1: Import from local file

Given I have a CSV file with the following data at /path/to/my/file.csv

timestamp duration cpu-util energy
2023-07-06T00:00 1 20 5
2023-07-06T00:01 1 30 10
2023-07-06T00:02 1 40 15

When I add the following to my manifest:

  plugins:
    data-import:
      method: CSVImport
      path: 'builtin'
      config:
        filepath: /path/to/my/file.csv
        output: '*'

Then the plugin has the following output:

      outputs:
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 20
          energy: 5
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 30
          energy: 10
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu-util: 40
          energy: 15
  • Scenario 2: Import from a URL

Given I have the same data from scenario 1 available to the public internet at https://mywebsite.com/file.csv
When I add the same block from scenario 1 to my manifest, except with filepath: https://mywebsite.com/file.csv
Then the plugin has the same output as scenario 1

  • Scenario 3: Selecting and Renaming Columns
    Given I have the same file from scenario 1
    When I add the following block to my manifest
  plugins:
    data-import:
      method: CSVImport
      path: 'builtin'
      config:
        filepath: /path/to/my/file.csv
        output: [ ['timestamp'], ['duration'], ['cpu-util', 'cpu/utilization'] ]

Then the plugin has the following output:

      outputs:
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu/utilization: 20
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu/utilization: 30
        - timestamp: 2023-07-06T00:00
          duration: 1
          cpu/utilization: 40

This format for the output config was selected to be consistent with the csv-lookup plugin. I think this is not the simplest way to structure this config, and would like to discuss if complicating the config is worth keeping things consistent between plugins.

It should accept the following:

  • '*' indicating all columns should be selected
  • 'foo' indicating that only column 'foo' should be selected
  • ['foo', 'bar'] indicating that only column 'foo' should be selected and output as 'bar
  • [ ['foo', 'bar'], ['bat'] ] indicating that the 'foo' and 'bat' columns should be selected with 'foo' output as 'bar'
@zanete zanete added the good-first-issue This issue is a good one for someone looking to get involved with this project/initiative. label Oct 21, 2024
@jmcook1186
Copy link
Contributor

Hi @bvickers7 seems like a useful utility plugin - do you want to work on it? We typically ask open source contributors to host their plugins on their own Github repositories and share them via the Explorer to control our maintenance burden.

@bvickers7
Copy link
Author

Yes, I am interested in working on this. I proposed this is a builtin plugin because it seemed generic enough to warrant that, but I can develop it in a separate repo and share via explorer instead.

@jmcook1186
Copy link
Contributor

Hi @bvickers7 brilliant thanks, assigning you to the issue now, let us know if you need any support. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-issue This issue is a good one for someone looking to get involved with this project/initiative.
Projects
Status: In Refinement
Development

No branches or pull requests

3 participants