You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/02/24 15:10:02 UTC
[jira] [Updated] (ARROW-7907) [Python] Conversion to pandas of
empty table with timestamp type aborts
[ https://issues.apache.org/jira/browse/ARROW-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-7907:
--------------------------------
Fix Version/s: 1.0.0
> [Python] Conversion to pandas of empty table with timestamp type aborts
> -----------------------------------------------------------------------
>
> Key: ARROW-7907
> URL: https://issues.apache.org/jira/browse/ARROW-7907
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Priority: Major
> Fix For: 1.0.0, 0.16.1
>
>
> Creating an empty table:
> {code}
> In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})
> In [2]: table['a']
> Out[2]:
> <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
> [
> []
> ]
> In [3]: table.to_pandas()
> Out[3]:
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}
> the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails:
> {code}
> In [4]: table2 = table.slice(0, 0)
> In [5]: table2['a']
> Out[5]:
> <pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
> [
> ]
> In [6]: table2.to_pandas()
> ../src/arrow/table.cc:48: Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
> ...
> Aborted (core dumped)
> {code}
> and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow).
> I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)