Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable helper function support #11

Open
aaronsteers opened this issue Apr 1, 2022 · 4 comments
Open

Pluggable helper function support #11

aaronsteers opened this issue Apr 1, 2022 · 4 comments

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Apr 1, 2022

In a recent office hours conversation with @visch, we discussed the process of adding custom plugins and how this (currently) requires the entire repo to be forked - since python code needs to be altered in order to provide new shortcut functions.

Then, I realized, Python is flexible enough to accept the following:

{
    "python_imports": {
        // from copy import deepcopy as "fn_copy"
        "copy": {
             "fn_copy": "deepcopy"
        }
    },
    "stream_maps": {
        {
            "my_stream": {
                // takes all fields from the record and puts them in a single object column:
                "record_data": "fn_copy(record)",
                "__else__": None
            }
        }
   }
}

Obviously there are security implications and this could only be feasible in environments when the config can be trusted.

Also, there's the prerequisite of needing to install the related libraries, but there are plenty of ways to get python libraries installed and accessible for the virtual environment.

If using meltano for instance, a user could simply alter the pip_url from pip_url: meltano-map-transform to pip_url: meltano-map-transform my_custom_lib and then the mapper would have access to the installed library.

Another "install" option without actually building a full pip-installable module would be to set PYTHONPATH to something that includes a local python script file.

Cc @tayloramurphy, @edgarrmondragon

@visch
Copy link
Member

visch commented Apr 1, 2022

Would solve my problem, I just wonder about the implications here. Does this encourage bad practices? Would this become unwieldy? Imagine 100's - 1000's of these random functions people write. Maybe it's fine?

I like it though!

@edgarrmondragon
Copy link
Member

I worry a bit about the security holes we may open with such a feature and the ecosystem might miss out on useful mapping functions that never get contributed upstream. That doesn't mean we can hope to catalog all possible mappings, though 😅.

I think forking the repo to add new functions is not too bad if

  1. we point contributors to the file they need to update and potentially make that file only consist of available mapping functions
  2. we make it easy for the contributor to add a unit test for the new function
  3. the release cadence of the SDK is quick

These changes are refactoring that would be arguably easier to implement than the python_imports config and encourage users to contribute to an expanding library of mapping functions 😄

@visch @aaronsteers @tayloramurphy wdyt?

@visch
Copy link
Member

visch commented Apr 6, 2022

Maybe changes like https://gitlab.com/vischous/singer-sdk/-/merge_requests/1/diffs#5419fd77f507b41e5e0fe17b6fbf7375fbf2c3eb are an exception, but this one adds a dependency of pywin32 = "303" , this won't work on linux.

My guess is we don't want the SDK dependency tree going crazy due to mappings that folks would like. For me I try to do as much as I can in SQL so that leaves things that I'll probably pull dependencies into for Python.

For folks who do all mapping in Python this makes a lot of sense as most of the mappings are probably standard library things.

@visch
Copy link
Member

visch commented Apr 6, 2022

I guess with extras we could make it work in the SDK to put mappers (even with crazy dependencies) into the sdk. Tough call I guess either way, one way every mapper ends up in the SDK for all to enjoy.

This seems similar to the tradeoff ansible made a while back with their plugins. They were all in one repo, then switched (after years and years) to seperate repos. I don't know what makes the most sense here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants