You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/06/09 08:19:00 UTC
[jira] [Created] (ARROW-9078) [C++] Parquet writing of extension
type with nested storage type fails
Joris Van den Bossche created ARROW-9078:
--------------------------------------------
Summary: [C++] Parquet writing of extension type with nested storage type fails
Key: ARROW-9078
URL: https://issues.apache.org/jira/browse/ARROW-9078
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Joris Van den Bossche
A reproducer in Python:
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
class MyStructType(pa.PyExtensionType):
def __init__(self):
pa.PyExtensionType.__init__(self, pa.struct([('left', pa.int64()), ('right', pa.int64())]))
def __reduce__(self):
return MyStructType, ()
struct_array = pa.StructArray.from_arrays(
[
pa.array([0, 1], type="int64", from_pandas=True),
pa.array([1, 2], type="int64", from_pandas=True),
],
names=["left", "right"],
)
# works
table = pa.table({'a': struct_array})
pq.write_table(table, "test_struct.parquet")
# doesn't work
mystruct_array = pa.ExtensionArray.from_storage(MyStructType(), struct_array)
table = pa.table({'a': mystruct_array})
pq.write_table(table, "test_struct.parquet")
{code}
Writing the simple StructArray nowadays works (and reading it back in as well).
But when the struct array is the storage array of an ExtensionType, it fails with the following error:
{code}
ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)