You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/08/07 10:59:00 UTC

[jira] [Created] (ARROW-6158) [Python] possible to create StructArray with type that conflicts with child array's types

Joris Van den Bossche created ARROW-6158:
--------------------------------------------

             Summary: [Python] possible to create StructArray with type that conflicts with child array's types
                 Key: ARROW-6158
                 URL: https://issues.apache.org/jira/browse/ARROW-6158
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Joris Van den Bossche


Using the Python interface as example. This creates a {{StructArray}} where the field types don't match the child array types:

{code}
a = pa.array([1, 2, 3], type=pa.int64())
b = pa.array(['a', 'b', 'c'], type=pa.string())
inconsistent_fields = [pa.field('a', pa.int32()), pa.field('b', pa.float64())]

a = pa.StructArray.from_arrays([a, b], fields=inconsistent_fields) 
{code}

The above works fine. I didn't find anything that errors (eg conversion to pandas, slicing), also validation passes, but the type actually has the inconsistent child types:

{code}
In [2]: a
Out[2]: 
<pyarrow.lib.StructArray object at 0x7f450af52eb8>
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    2,
    3
  ]
-- child 1 type: string
  [
    "a",
    "b",
    "c"
  ]

In [3]: a.type
Out[3]: StructType(struct<a: int32, b: double>)

In [4]: a.to_pandas()
Out[4]: 
array([{'a': 1, 'b': 'a'}, {'a': 2, 'b': 'b'}, {'a': 3, 'b': 'c'}],
      dtype=object)

In [5]: a.validate() 
{code}

Shouldn't this be disallowed somehow? (it could be checked in the Python {{from_arrays}} method, but maybe also in {{StructArray::Make}} which already checks for the number of fields vs arrays and a consistent array length). 

Similarly to discussion in ARROW-6132, I would also expect that this the {{ValidateArray}} catches this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)