You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Abderrahmane Jaidi (Jira)" <ji...@apache.org> on 2021/05/10 21:30:00 UTC
[jira] [Created] (ARROW-12732) read parquet in pyarrow is not
idempotent for time period types
Abderrahmane Jaidi created ARROW-12732:
------------------------------------------
Summary: read parquet in pyarrow is not idempotent for time period types
Key: ARROW-12732
URL: https://issues.apache.org/jira/browse/ARROW-12732
Project: Apache Arrow
Issue Type: Bug
Components: Parquet, Python
Affects Versions: 4.0.0, 3.0.0
Reporter: Abderrahmane Jaidi
Attachments: period.parquet
When reading a parquet file (attached) with a period type column via the "read_table" method, it returns "int64" on the first read. After applying "to_pandas" to the pyarrow table, subsequent "read_table" calls of the same parquet file in the same *Python session* return "ArrowPeriodType"
{code:java}
import pyarrow
import pyarrow.parquet
pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[1]: [DataType(int64)]
print(pq_table.to_pandas())
# Out[2]:
# col
# 0 2010-01
pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[3]: [ArrowPeriodType(DataType(int64))]
pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
print(pq_table.schema.types)
# Out[4]: [ArrowPeriodType(DataType(int64))]{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)