You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/02/24 15:10:02 UTC
[jira] [Updated] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts

     [ https://issues.apache.org/jira/browse/ARROW-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-7907:
--------------------------------
    Fix Version/s: 1.0.0

> [Python] Conversion to pandas of empty table with timestamp type aborts
> -----------------------------------------------------------------------
>
>                 Key: ARROW-7907
>                 URL: https://issues.apache.org/jira/browse/ARROW-7907
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 1.0.0, 0.16.1
>
>
> Creating an empty table:
> {code}
> In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})                                                                                                                                             
> In [2]: table['a']                                                                                                                                                                                                 
> Out[2]: 
> <pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
> [
>   []
> ]
> In [3]: table.to_pandas()                                                                                                                                                                                          
> Out[3]: 
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}
> the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails:
> {code}
> In [4]: table2 = table.slice(0, 0)                                                                                                                                                                                 
> In [5]: table2['a']                                                                                                                                                                                                
> Out[5]: 
> <pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
> [
> ]
> In [6]: table2.to_pandas()                                                                                                                                                                                         
> ../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
> ...
> Aborted (core dumped)
> {code}
> and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow).
> I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)