Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load raw JSON fields as JSON without flattening #162

Open
melgazar9 opened this issue Mar 19, 2024 · 1 comment
Open

Unable to load raw JSON fields as JSON without flattening #162

melgazar9 opened this issue Mar 19, 2024 · 1 comment

Comments

@melgazar9
Copy link

When I run meltano tap-coingecko target-jsonl I am able to load raw json fields with th.AnyType() or th.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {}]}). However when using target-snowflake (target-bigquery fails as well), I get the following errors:

When using th.AnyType() defined as my schema:

File "/Users/melgazar9/scripts/github/personal/tap-coingecko/.meltano/loaders/target-snowflake/venv/lib/python3.9/site-packages/target_snowflake/flattening.py", line 56, in flatten_schema cmd_type=loader name=target-snowflake run_id=aee098d1-bc00-4de0-a72a-3f2c15829c16 state_id=tmp stdio=stderr
2024-03-19T17:29:22.444742Z [info     ]     value_type = list(v.values())[0][0]['type'] cmd_type=loader name=target-snowflake run_id=aee098d1-bc00-4de0-a72a-3f2c15829c16 state_id=tmp stdio=stderr
2024-03-19T17:29:22.444851Z [info     ] TypeError: string indices must be integers

when using th.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {}]}) defined as my schema:

File "/Users/melgazar9/scripts/github/personal/tap-coingecko/.meltano/loaders/target-snowflake/venv/lib/python3.9/site-packages/target_snowflake/__init__.py", line 143, in persist_lines cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471088Z [info     ]     stream_utils.adjust_timestamps_in_record(o['record'], schemas[stream]) cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471148Z [info     ]   File "/Users/melgazar9/scripts/github/personal/tap-coingecko/.meltano/loaders/target-snowflake/venv/lib/python3.9/site-packages/target_snowflake/stream_utils.py", line 65, in adjust_timestamps_in_record cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471682Z [info     ]     if 'string' in type_dict['type'] and type_dict.get('format', None) in {'date-time', 'time', 'date'}: cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471795Z [info     ] KeyError: 'type'

An example json output is

{"id": "zzz", "symbol": "zzz", "name": "GoSleep ZZZ", "platforms": {"arbitrum-one": "0x7a2c1b8e26c48a5b73816b7ec826fd4053f5f34b", "binance-smart-chain": "0x0b9bdcc696efa768cafe0e675525eaf42e32d108"}}

Apologies if this was answered before. I am somewhat new to meltano and could not find a solution to this.

The same issue happens when running target-bigquery so I'm not sure this is target-snowflake specific.

@melgazar9
Copy link
Author

melgazar9 commented Apr 3, 2024

Update: I am able to load most JSON data by running the below code block.

SCHEMA = <schema dictionary>
json_id_fields = [<list of expected json fields>]

CUSTOM_JSON_SCHEMA = {'additionalProperties': True, 'description': 'Custom JSON typing.', 'type': ['object', 'null']}

for field in json_id_fields:
    SCHEMA["properties"][field] = CUSTOM_JSON_SCHEMA

In both jsonl and snowflake, the above method works for simple jsons such as {'hi': 2, 'bye': 1}. However it somehow works in snowflake but fails in jsonl for more complicated json structures / arrays such as this: [{"description": "Test."}, {"hi": 2, "bye": 39}]

Lastly, I was able to get it to work in both jsonl and snowflake using the below snippet, but this fails for other loaders such as bigquery.

th.ArrayType(th.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {"type": "null"}, {}, [{}]]}))

I posted on meltanos slack channel as well and it seems to be a common misunderstanding. I'm not able to find a source that shows a universal JSON schema type that works across loaders. What's the recommended approach for schema typing json fields?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant