You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "quentin lhoest (Jira)" <ji...@apache.org> on 2021/10/22 14:06:00 UTC

[jira] [Created] (ARROW-14439) [Python][C++] Segfault with read_json when a field is missing

quentin lhoest created ARROW-14439:
--------------------------------------

             Summary: [Python][C++] Segfault with read_json when a field is missing
                 Key: ARROW-14439
                 URL: https://issues.apache.org/jira/browse/ARROW-14439
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 5.0.0
            Reporter: quentin lhoest


When reading a JSON Lines file, a segfault can happen if there's a missing field at one point.
In particular when the missing field is supposed to be a list, and if the block size is small enough.

Here is an example to reproduce:
{code:python}
import io

import pyarrow.json as paj

batch = b'{"a": [], "b": 1}\n{"b": 1}'
block_size = 12

paj.read_json(
    io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)