You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Abderrahmane Jaidi (Jira)" <ji...@apache.org> on 2021/05/10 21:30:00 UTC

[jira] [Created] (ARROW-12732) read parquet in pyarrow is not idempotent for time period types

Abderrahmane Jaidi created ARROW-12732:
------------------------------------------

             Summary: read parquet in pyarrow is not idempotent for time period types
                 Key: ARROW-12732
                 URL: https://issues.apache.org/jira/browse/ARROW-12732
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, Python
    Affects Versions: 4.0.0, 3.0.0
            Reporter: Abderrahmane Jaidi
         Attachments: period.parquet

When reading a parquet file (attached) with a period type column via the "read_table" method, it returns "int64" on the first read. After applying "to_pandas" to the pyarrow table, subsequent "read_table" calls of the same parquet file in the same *Python session* return "ArrowPeriodType"
{code:java}
import pyarrow
import pyarrow.parquet


pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[1]: [DataType(int64)]

print(pq_table.to_pandas())
# Out[2]:
# col
# 0 2010-01

pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[3]: [ArrowPeriodType(DataType(int64))]

pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[4]: [ArrowPeriodType(DataType(int64))]{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)