You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "quentin lhoest (Jira)" <ji...@apache.org> on 2021/11/02 14:54:00 UTC

[jira] [Commented] (ARROW-14439) [Python][C++] Segfault with read_json when a field is missing

    [ https://issues.apache.org/jira/browse/ARROW-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437408#comment-17437408 ] 

quentin lhoest commented on ARROW-14439:
----------------------------------------

Indeed it has been fixed in 6.0.0, thanks a lot !

> [Python][C++] Segfault with read_json when a field is missing
> -------------------------------------------------------------
>
>                 Key: ARROW-14439
>                 URL: https://issues.apache.org/jira/browse/ARROW-14439
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 5.0.0
>            Reporter: quentin lhoest
>            Priority: Major
>
> When reading a JSON Lines file, a segfault can happen if there's a missing field at one point.
> In particular when the missing field is supposed to be a list, and if the block size is small enough.
> Here is an example to reproduce:
> {code:python}
> import io
> import pyarrow.json as paj
> batch = b'{"a": [], "b": 1}\n{"b": 1}'
> block_size = 12
> paj.read_json(
>     io.BytesIO(batch), read_options=paj.ReadOptions(block_size=block_size)
> )
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)