You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/20 13:02:00 UTC
[jira] [Created] (ARROW-18107) [C++] Provide more informative error when (CSV/JSON) parsing fails
Joris Van den Bossche created ARROW-18107:
---------------------------------------------
Summary: [C++] Provide more informative error when (CSV/JSON) parsing fails
Key: ARROW-18107
URL: https://issues.apache.org/jira/browse/ARROW-18107
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Joris Van den Bossche
Related to ARROW-18106 (and derived from https://stackoverflow.com/questions/74138746/why-i-cant-parse-timestamp-in-pyarrow).
Assume you have the following code to read a JSON file with timestamps. The timestamps have a sub-second part in their string, which fails parsing if you specify it as second resolution timestamp:
{code:python}
import io
from pyarrow import json
s_json = """{"column":"2022-09-05T08:08:46.000"}"""
opts = json.ParseOptions(explicit_schema=pa.schema([("column", pa.timestamp("s"))]), unexpected_field_behavior="ignore")
json.read_json(io.BytesIO(s_json.encode()), parse_options=opts)
{code}
gives:
{code}
ArrowInvalid: Failed of conversion of JSON to timestamp[s], couldn't parse:2022-09-05T08:08:46.000
{code}
This error is expected, but I think it could be more informative about the reason why it failed parsing (because at first sight it looks like a proper timestamp string, so you might be left wondering why this is failing).
(this might not be that straightforward, though, since there can be many reasons why the parsing is failing)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)