You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2019/11/29 10:41:00 UTC
[jira] [Resolved] (ARROW-6157) [Python][C++] UnionArray with
invalid data passes validation / leads to segfaults
[ https://issues.apache.org/jira/browse/ARROW-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Pitrou resolved ARROW-6157.
-----------------------------------
Resolution: Fixed
Issue resolved by pull request 5892
[https://github.com/apache/arrow/pull/5892]
> [Python][C++] UnionArray with invalid data passes validation / leads to segfaults
> ---------------------------------------------------------------------------------
>
> Key: ARROW-6157
> URL: https://issues.apache.org/jira/browse/ARROW-6157
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Reporter: Joris Van den Bossche
> Assignee: Antoine Pitrou
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> From the Python side, you can create an "invalid" UnionArray:
> {code}
> binary = pa.array([b'a', b'b', b'c', b'd'], type='binary')
> int64 = pa.array([1, 2, 3], type='int64')
> types = pa.array([0, 1, 0, 0, 2, 1, 0], type='int8') # <- value of 2 is out of bound for number of childs
> value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')
> a = pa.UnionArray.from_dense(types, value_offsets, [binary, int64])
> {code}
> Eg on conversion to python this leads to a segfault:
> {code}
> In [7]: a.to_pylist()
> Segmentation fault (core dumped)
> {code}
> On the other hand, doing an explicit validation does not give an error:
> {code}
> In [8]: a.validate()
> {code}
> Should the validation raise errors for this case? (the C++ {{ValidateVisitor}} for UnionArray does nothing)
> (so that this can be called from the Python API to avoid creating invalid arrays / segfaults there)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)