You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/08/07 10:31:00 UTC

[jira] [Created] (ARROW-6157) [Python][C++] UnionArray with invalid data passes validation / leads to segfaults

Joris Van den Bossche created ARROW-6157:
--------------------------------------------

             Summary: [Python][C++] UnionArray with invalid data passes validation / leads to segfaults
                 Key: ARROW-6157
                 URL: https://issues.apache.org/jira/browse/ARROW-6157
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
            Reporter: Joris Van den Bossche


From the Python side, you can create an "invalid" UnionArray:

{code}
binary = pa.array([b'a', b'b', b'c', b'd'], type='binary') 
int64 = pa.array([1, 2, 3], type='int64') 
types = pa.array([0, 1, 0, 0, 2, 1, 0], type='int8')   # <- value of 2 is out of bound for number of childs
value_offsets = pa.array([0, 0, 2, 1, 1, 2, 3], type='int32')

a = pa.UnionArray.from_dense(types, value_offsets, [binary, int64])
{code}

Eg on conversion to python this leads to a segfault:

{code}
In [7]: a.to_pylist()
Segmentation fault (core dumped)
{code}

On the other hand, doing an explicit validation does not give an error:

{code}
In [8]: a.validate()
{code}

Should the validation raise errors for this case? (the C++ {{ValidateVisitor}} for UnionArray does nothing)




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)