You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Adam Hooper (Jira)" <ji...@apache.org> on 2019/09/15 19:05:00 UTC
[jira] [Commented] (ARROW-6568) pyarrow.parquet crash writing
zero-chunk dictionary-type column
[ https://issues.apache.org/jira/browse/ARROW-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930052#comment-16930052 ]
Adam Hooper commented on ARROW-6568:
------------------------------------
My workaround, in my function that wraps `pyarrow.parquet.write_table()`:
{code:python}
if table.num_rows == 0:
# Workaround for https://issues.apache.org/jira/browse/ARROW-6568
# If table is zero-length, guarantee it has a RecordBatch so Arrow
# won't crash when writing a DictionaryArray.
def empty_array_for_field(field):
if pyarrow.types.is_dictionary(field.type):
return pyarrow.DictionaryArray.from_arrays(
pyarrow.array([], type=field.type.index_type),
pyarrow.array([], type=field.type.value_type),
)
else:
return pyarrow.array([], type=field.type)
table = pyarrow.table(
{field.name: empty_array_for_field(field) for field in table.schema}
)
# ... and now `table` is safe to use in `pyarrow.parquet.write_table()`.
{code}
> pyarrow.parquet crash writing zero-chunk dictionary-type column
> ---------------------------------------------------------------
>
> Key: ARROW-6568
> URL: https://issues.apache.org/jira/browse/ARROW-6568
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.1
> Environment: Pyarrow v0.14.1, manylinux1
> Reporter: Adam Hooper
> Priority: Major
>
> Trying to write a zero-RecordBatch file to parquet:
> {code:python}
> import pyarrow
> import pyarrow.parquet
> table = pyarrow.Table.from_batches([], pyarrow.schema([('A', pyarrow.dictionary(pyarrow.int32(), pyarrow.string()))]))
> pyarrow.parquet.write_table(table, 'x.parquet')
> {code}
> ... I receive an error and Python exits with exit code {{139}}:
> {noformat}
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> F0915 18:37:23.099939 1 table.cc:64] Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
> *** Check failure stack trace: ***
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)