You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Peter Goldsborough (Jira)" <ji...@apache.org> on 2020/12/18 05:38:00 UTC

[jira] [Updated] (ARROW-10955) Cannot read empty json lists and write them as parquet

     [ https://issues.apache.org/jira/browse/ARROW-10955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Goldsborough updated ARROW-10955:
---------------------------------------
    Description: 
We're using Arrow to convert from JSON to Parquet and occasionally have empty lists in our json. Reading such JSON into an Arrow table and writing it to Parquet currently fails. We noticed this issue in our C++ Arrow code, but it also happens from Python.

Minimal repro:

input.json:

{"foo": []}

 

convert.py:
 import pyarrow.json
 import pyarrow.parquet

t = pyarrow.json.read_json("input.json")
 pyarrow.parquet.write_table(t, "out.parquet")
  

Produces:

Traceback (most recent call last):
 File "repro.py", line 5, in <module>
 pyarrow.parquet.write_table(t, "out.parquet")
env/lib/python3.8/site-packages/pyarrow/parquet.py", line 1717, in write_table
 with ParquetWriter(
 File "env/lib/python3.8/site-packages/pyarrow/parquet.py", line 554, in __init__
 self.writer = _parquet.ParquetWriter(
 File "pyarrow/_parquet.pyx", line 1409, in pyarrow._parquet.ParquetWriter.__cinit__
 File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
 pyarrow.lib.ArrowInvalid: NullType Arrow field must be nullable

 

  was:
We're using Arrow to convert from JSON to Parquet and occasionally have empty lists in our json. Reading such JSON into an Arrow table and writing it to Parquet currently fails. We noticed this issue in our C++ Arrow code, but it also happens from Python.

Minimal repro:

input.json:

{"foo": []}

 

convert.py:
import pyarrow.json
import pyarrow.parquet

t = pyarrow.json.read_json("input.json")
pyarrow.parquet.write_table(t, "out.parquet")
 

Produces:

Traceback (most recent call last):
 File "repro.py", line 5, in <module>
 pyarrow.parquet.write_table(t, "out.parquet")
 File "/Users/pgoldsborough/anduril/capacitor/env/lib/python3.8/site-packages/pyarrow/parquet.py", line 1717, in write_table
 with ParquetWriter(
 File "/Users/pgoldsborough/anduril/capacitor/env/lib/python3.8/site-packages/pyarrow/parquet.py", line 554, in __init__
 self.writer = _parquet.ParquetWriter(
 File "pyarrow/_parquet.pyx", line 1409, in pyarrow._parquet.ParquetWriter.__cinit__
 File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: NullType Arrow field must be nullable

 


> Cannot read empty json lists and write them as parquet
> ------------------------------------------------------
>
>                 Key: ARROW-10955
>                 URL: https://issues.apache.org/jira/browse/ARROW-10955
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 0.17.0, 0.17.1, 1.0.0, 2.0.0
>         Environment: linux and mac
>            Reporter: Peter Goldsborough
>            Priority: Major
>
> We're using Arrow to convert from JSON to Parquet and occasionally have empty lists in our json. Reading such JSON into an Arrow table and writing it to Parquet currently fails. We noticed this issue in our C++ Arrow code, but it also happens from Python.
> Minimal repro:
> input.json:
> {"foo": []}
>  
> convert.py:
>  import pyarrow.json
>  import pyarrow.parquet
> t = pyarrow.json.read_json("input.json")
>  pyarrow.parquet.write_table(t, "out.parquet")
>   
> Produces:
> Traceback (most recent call last):
>  File "repro.py", line 5, in <module>
>  pyarrow.parquet.write_table(t, "out.parquet")
> env/lib/python3.8/site-packages/pyarrow/parquet.py", line 1717, in write_table
>  with ParquetWriter(
>  File "env/lib/python3.8/site-packages/pyarrow/parquet.py", line 554, in __init__
>  self.writer = _parquet.ParquetWriter(
>  File "pyarrow/_parquet.pyx", line 1409, in pyarrow._parquet.ParquetWriter.__cinit__
>  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
>  pyarrow.lib.ArrowInvalid: NullType Arrow field must be nullable
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)