You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/02/21 15:50:00 UTC
[jira] [Created] (ARROW-7907) [Python] Conversion to pandas of
empty table with timestamp type aborts
Joris Van den Bossche created ARROW-7907:
--------------------------------------------
Summary: [Python] Conversion to pandas of empty table with timestamp type aborts
Key: ARROW-7907
URL: https://issues.apache.org/jira/browse/ARROW-7907
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Joris Van den Bossche
Fix For: 0.16.1
Creating an empty table:
{code}
In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})
In [2]: table['a']
Out[2]:
<pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
[
[]
]
In [3]: table.to_pandas()
Out[3]:
Empty DataFrame
Columns: [a]
Index: []
{code}
the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails:
{code}
In [4]: table2 = table.slice(0, 0)
In [5]: table2['a']
Out[5]:
<pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
[
]
In [6]: table2.to_pandas()
../src/arrow/table.cc:48: Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
...
Aborted (core dumped)
{code}
and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow).
I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)