You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Tim Loderhose (Jira)" <ji...@apache.org> on 2022/06/15 13:28:00 UTC
[jira] [Created] (ARROW-16834) Handle impossible conversions in csv.ConvertOptions
Tim Loderhose created ARROW-16834:
-------------------------------------
Summary: Handle impossible conversions in csv.ConvertOptions
Key: ARROW-16834
URL: https://issues.apache.org/jira/browse/ARROW-16834
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 8.0.0
Reporter: Tim Loderhose
https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions allows for skipping invalid rows by means of the `invalid_row_handler`.
In https://arrow.apache.org/docs/python/generated/pyarrow.csv.ConvertOptions.html#pyarrow.csv.ConvertOptions, one can supply a schema to get correct types in the resulting table.
I have a data source that almost always follows a specific schema, but its data isn't validated beforehand. In practice, it's possible for a field which is int16 99.9% of the time to have an out-of-range value in a few rows.
I'd like to handle those cases similarly to the `invalid_row_handler`, perhaps allowing to set failing conversions to NULL, or supplying a handler to apply a more specific operation.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)