You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "quentin lhoest (Jira)" <ji...@apache.org> on 2021/11/02 14:54:00 UTC
[jira] [Resolved] (ARROW-14439) [Python][C++] Segfault with
read_json when a field is missing
[ https://issues.apache.org/jira/browse/ARROW-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
quentin lhoest resolved ARROW-14439.
------------------------------------
Fix Version/s: 6.0.0
Resolution: Fixed
> [Python][C++] Segfault with read_json when a field is missing
> -------------------------------------------------------------
>
> Key: ARROW-14439
> URL: https://issues.apache.org/jira/browse/ARROW-14439
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 5.0.0
> Reporter: quentin lhoest
> Priority: Major
> Fix For: 6.0.0
>
>
> When reading a JSON Lines file, a segfault can happen if there's a missing field at one point.
> In particular when the missing field is supposed to be a list, and if the block size is small enough.
> Here is an example to reproduce:
> {code:python}
> import io
> import pyarrow.json as paj
> batch = b'{"a": [], "b": 1}\n{"b": 1}'
> block_size = 12
> paj.read_json(
> io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
> )
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)