You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/02/21 15:50:00 UTC

[jira] [Created] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts

Joris Van den Bossche created ARROW-7907:
--------------------------------------------

             Summary: [Python] Conversion to pandas of empty table with timestamp type aborts
                 Key: ARROW-7907
                 URL: https://issues.apache.org/jira/browse/ARROW-7907
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche
             Fix For: 0.16.1


Creating an empty table:

{code}
In [1]: table = pa.table({'a': pa.array([], type=pa.timestamp('us'))})                                                                                                                                             

In [2]: table['a']                                                                                                                                                                                                 
Out[2]: 
<pyarrow.lib.ChunkedArray object at 0x7fbb783e8098>
[
  []
]

In [3]: table.to_pandas()                                                                                                                                                                                          
Out[3]: 
Empty DataFrame
Columns: [a]
Index: []
{code}

the above works. But the ChunkedArray still has 1 empty chunk. When filtering data, you can actually get no chunks, and this fails:


{code}
In [4]: table2 = table.slice(0, 0)                                                                                                                                                                                 

In [5]: table2['a']                                                                                                                                                                                                
Out[5]: 
<pyarrow.lib.ChunkedArray object at 0x7fbb783aa4a8>
[

]

In [6]: table2.to_pandas()                                                                                                                                                                                         
../src/arrow/table.cc:48:  Check failed: (chunks.size()) > (0) cannot construct ChunkedArray from empty vector and omitted type
...
Aborted (core dumped)
{code}

and this seems to happen specifically for timestamp type, and specifically with non-ns unit (eg with us as above, which is the default in arrow).

I noticed this when reading a parquet file of the taxi dataset, where the filter I used resulted in an empty batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)