You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/01/05 13:03:00 UTC

[jira] [Commented] (ARROW-10955) [C++] Reading empty json lists results in invalid non-nullable null type

    [ https://issues.apache.org/jira/browse/ARROW-10955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258906#comment-17258906 ] 

Joris Van den Bossche commented on ARROW-10955:
-----------------------------------------------

bq. Is there a manual workaround we can use to make this conversion process still work? Like change the schema of the json table after it's been loaded?

Yes, you should be able to cast the resulting table to a new schema (without the non-nullable null field) after reading in the json.

bq. When can we expect 3.0.0?

The final release can be expected somewhere second half of January

> [C++] Reading empty json lists results in invalid non-nullable null type
> ------------------------------------------------------------------------
>
>                 Key: ARROW-10955
>                 URL: https://issues.apache.org/jira/browse/ARROW-10955
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.17.0, 0.17.1, 1.0.0, 2.0.0
>            Reporter: Peter Goldsborough
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: json, pull-request-available
>             Fix For: 3.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We're using Arrow to convert from JSON to Parquet and occasionally have empty lists in our json. Reading such JSON into an Arrow table and writing it to Parquet currently fails. We noticed this issue in our C++ Arrow code, but it also happens from Python.
> Minimal repro:
> input.json:
> {"foo": []}
>  
> convert.py:
>  import pyarrow.json
>  import pyarrow.parquet
> t = pyarrow.json.read_json("input.json")
>  pyarrow.parquet.write_table(t, "out.parquet")
>   
> Produces:
> Traceback (most recent call last):
>  File "repro.py", line 5, in <module>
>  pyarrow.parquet.write_table(t, "out.parquet")
> env/lib/python3.8/site-packages/pyarrow/parquet.py", line 1717, in write_table
>  with ParquetWriter(
>  File "env/lib/python3.8/site-packages/pyarrow/parquet.py", line 554, in __init__
>  self.writer = _parquet.ParquetWriter(
>  File "pyarrow/_parquet.pyx", line 1409, in pyarrow._parquet.ParquetWriter.__cinit__
>  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
>  pyarrow.lib.ArrowInvalid: NullType Arrow field must be nullable
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)