You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/10/20 13:02:00 UTC

[jira] [Created] (ARROW-18107) [C++] Provide more informative error when (CSV/JSON) parsing fails

Joris Van den Bossche created ARROW-18107:
---------------------------------------------

             Summary: [C++] Provide more informative error when (CSV/JSON) parsing fails
                 Key: ARROW-18107
                 URL: https://issues.apache.org/jira/browse/ARROW-18107
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


Related to ARROW-18106 (and derived from https://stackoverflow.com/questions/74138746/why-i-cant-parse-timestamp-in-pyarrow). 

Assume you have the following code to read a JSON file with timestamps. The timestamps have a sub-second part in their string, which fails parsing if you specify it as second resolution timestamp:

{code:python}
import io
from pyarrow import json

s_json = """{"column":"2022-09-05T08:08:46.000"}"""

opts = json.ParseOptions(explicit_schema=pa.schema([("column", pa.timestamp("s"))]), unexpected_field_behavior="ignore")
json.read_json(io.BytesIO(s_json.encode()), parse_options=opts)
{code}

gives:

{code}
ArrowInvalid: Failed of conversion of JSON to timestamp[s], couldn't parse:2022-09-05T08:08:46.000
{code}

This error is expected, but I think it could be more informative about the reason why it failed parsing (because at first sight it looks like a proper timestamp string, so you might be left wondering why this is failing). 

(this might not be that straightforward, though, since there can be many reasons why the parsing is failing)







--
This message was sent by Atlassian Jira
(v8.20.10#820010)