You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/06/09 08:19:00 UTC

[jira] [Created] (ARROW-9078) [C++] Parquet writing of extension type with nested storage type fails

Joris Van den Bossche created ARROW-9078:
--------------------------------------------

             Summary: [C++] Parquet writing of extension type with nested storage type fails
                 Key: ARROW-9078
                 URL: https://issues.apache.org/jira/browse/ARROW-9078
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


A reproducer in Python:

{code:python}
import pyarrow as pa
import pyarrow.parquet as pq


class MyStructType(pa.PyExtensionType): 
 
    def __init__(self): 
        pa.PyExtensionType.__init__(self, pa.struct([('left', pa.int64()), ('right', pa.int64())])) 
 
    def __reduce__(self): 
        return MyStructType, () 


struct_array = pa.StructArray.from_arrays(
    [
        pa.array([0, 1], type="int64", from_pandas=True),
        pa.array([1, 2], type="int64", from_pandas=True),
    ],
    names=["left", "right"],
)

# works
table = pa.table({'a': struct_array})
pq.write_table(table, "test_struct.parquet")

# doesn't work
mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array)
table = pa.table({'a': mystruct_array})
pq.write_table(table, "test_struct.parquet")
{code}

Writing the simple StructArray nowadays works (and reading it back in as well). 

But when the struct array is the storage array of an ExtensionType, it fails with the following error:

{code}
ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)