You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run meltano tap-coingecko target-jsonl I am able to load raw json fields with th.AnyType() or th.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {}]}). However when using target-snowflake (target-bigquery fails as well), I get the following errors:
When using th.AnyType() defined as my schema:
File "/Users/melgazar9/scripts/github/personal/tap-coingecko/.meltano/loaders/target-snowflake/venv/lib/python3.9/site-packages/target_snowflake/flattening.py", line 56, in flatten_schema cmd_type=loader name=target-snowflake run_id=aee098d1-bc00-4de0-a72a-3f2c15829c16 state_id=tmp stdio=stderr
2024-03-19T17:29:22.444742Z [info ] value_type = list(v.values())[0][0]['type'] cmd_type=loader name=target-snowflake run_id=aee098d1-bc00-4de0-a72a-3f2c15829c16 state_id=tmp stdio=stderr
2024-03-19T17:29:22.444851Z [info ] TypeError: string indices must be integers
when using th.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {}]}) defined as my schema:
File "/Users/melgazar9/scripts/github/personal/tap-coingecko/.meltano/loaders/target-snowflake/venv/lib/python3.9/site-packages/target_snowflake/__init__.py", line 143, in persist_lines cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471088Z [info ] stream_utils.adjust_timestamps_in_record(o['record'], schemas[stream]) cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471148Z [info ] File "/Users/melgazar9/scripts/github/personal/tap-coingecko/.meltano/loaders/target-snowflake/venv/lib/python3.9/site-packages/target_snowflake/stream_utils.py", line 65, in adjust_timestamps_in_record cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471682Z [info ] if 'string' in type_dict['type'] and type_dict.get('format', None) in {'date-time', 'time', 'date'}: cmd_type=loader name=target-snowflake run_id=a3c3f352-de37-4b5c-a0d3-b3bcd20ea8c5 state_id=tmp stdio=stderr
2024-03-19T17:32:18.471795Z [info ] KeyError: 'type'
Update: I am able to load most JSON data by running the below code block.
SCHEMA = <schema dictionary>
json_id_fields = [<list of expected json fields>]
CUSTOM_JSON_SCHEMA = {'additionalProperties': True, 'description': 'Custom JSON typing.', 'type': ['object', 'null']}
for field in json_id_fields:
SCHEMA["properties"][field] = CUSTOM_JSON_SCHEMA
In both jsonl and snowflake, the above method works for simple jsons such as {'hi': 2, 'bye': 1}. However it somehow works in snowflake but fails in jsonl for more complicated json structures / arrays such as this: [{"description": "Test."}, {"hi": 2, "bye": 39}]
Lastly, I was able to get it to work in both jsonl and snowflake using the below snippet, but this fails for other loaders such as bigquery.
I posted on meltanos slack channel as well and it seems to be a common misunderstanding. I'm not able to find a source that shows a universal JSON schema type that works across loaders. What's the recommended approach for schema typing json fields?
When I run
meltano tap-coingecko target-jsonl
I am able to load raw json fields withth.AnyType()
orth.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {}]})
. However when usingtarget-snowflake
(target-bigquery
fails as well), I get the following errors:When using
th.AnyType()
defined as my schema:when using
th.CustomType({"anyOf": [{"type": "object"}, {"type": "array"}, {}]})
defined as my schema:An example json output is
Apologies if this was answered before. I am somewhat new to meltano and could not find a solution to this.
The same issue happens when running
target-bigquery
so I'm not sure this istarget-snowflake
specific.The text was updated successfully, but these errors were encountered: