You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "quentin lhoest (Jira)" <ji...@apache.org> on 2021/10/22 14:06:00 UTC
[jira] [Created] (ARROW-14439) [Python][C++] Segfault with
read_json when a field is missing
quentin lhoest created ARROW-14439:
--------------------------------------
Summary: [Python][C++] Segfault with read_json when a field is missing
Key: ARROW-14439
URL: https://issues.apache.org/jira/browse/ARROW-14439
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 5.0.0
Reporter: quentin lhoest
When reading a JSON Lines file, a segfault can happen if there's a missing field at one point.
In particular when the missing field is supposed to be a list, and if the block size is small enough.
Here is an example to reproduce:
{code:python}
import io
import pyarrow.json as paj
batch = b'{"a": [], "b": 1}\n{"b": 1}'
block_size = 12
paj.read_json(
io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)