You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Abderrahmane Jaidi (Jira)" <ji...@apache.org> on 2021/05/12 07:58:00 UTC
[jira] [Commented] (ARROW-12732) [Python] read parquet in pyarrow
is not idempotent for time period types
[ https://issues.apache.org/jira/browse/ARROW-12732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343095#comment-17343095 ]
Abderrahmane Jaidi commented on ARROW-12732:
--------------------------------------------
Understood, is there a more straightforward way of importing pandas types into Arrow? Calling a transformation operation such as "to_pandas" is not ideal to trigger that behavior.
> [Python] read parquet in pyarrow is not idempotent for time period types
> ------------------------------------------------------------------------
>
> Key: ARROW-12732
> URL: https://issues.apache.org/jira/browse/ARROW-12732
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet, Python
> Affects Versions: 3.0.0, 4.0.0
> Reporter: Abderrahmane Jaidi
> Priority: Major
> Attachments: period.parquet
>
>
> When reading a parquet file (attached) with a period type column via the "read_table" method, it returns "int64" on the first read. After applying "to_pandas" to the pyarrow table, subsequent "read_table" calls of the same parquet file in the same *Python session* return "ArrowPeriodType"
> {code:java}
> import pyarrow
> import pyarrow.parquet
> pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[1]: [DataType(int64)]
> print(pq_table.to_pandas())
> # Out[2]:
> # col
> # 0 2010-01
> pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[3]: [ArrowPeriodType(DataType(int64))]
> pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[4]: [ArrowPeriodType(DataType(int64))]{code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)