You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Florian Jetter (Jira)" <ji...@apache.org> on 2019/08/23 20:41:00 UTC

[jira] [Created] (ARROW-6339) [Python][C++] Rowgroup statistics for pd.NaT array ill defined

Florian Jetter created ARROW-6339:
-------------------------------------

             Summary: [Python][C++] Rowgroup statistics for pd.NaT array ill defined
                 Key: ARROW-6339
                 URL: https://issues.apache.org/jira/browse/ARROW-6339
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 0.14.1
            Reporter: Florian Jetter


When initialising an array with NaT only values the row group statistic is corrupt returning either random values or raises integer out of bound exceptions.
{code:python}
import io
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")})
buf = pa.BufferOutputStream()
pq.write_table(pa.Table.from_pandas(df), buf, version="2.0")
buf = io.BytesIO(buf.getvalue().to_pybytes())
parquet_file = pq.ParquetFile(buf)
# Asserting behaviour is difficult since it is random and the state is ill defined. 
# After a few iterations an exception is raised.
while True:
    parquet_file.metadata.row_group(0).column(0).statistics.max
{code}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)