You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Abderrahmane Jaidi (Jira)" <ji...@apache.org> on 2021/05/12 07:58:00 UTC

[jira] [Commented] (ARROW-12732) [Python] read parquet in pyarrow is not idempotent for time period types

    [ https://issues.apache.org/jira/browse/ARROW-12732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343095#comment-17343095 ] 

Abderrahmane Jaidi commented on ARROW-12732:
--------------------------------------------

Understood, is there a more straightforward way of importing pandas types into Arrow? Calling a transformation operation such as "to_pandas"  is not ideal to trigger that behavior.

> [Python] read parquet in pyarrow is not idempotent for time period types
> ------------------------------------------------------------------------
>
>                 Key: ARROW-12732
>                 URL: https://issues.apache.org/jira/browse/ARROW-12732
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Parquet, Python
>    Affects Versions: 3.0.0, 4.0.0
>            Reporter: Abderrahmane Jaidi
>            Priority: Major
>         Attachments: period.parquet
>
>
> When reading a parquet file (attached) with a period type column via the "read_table" method, it returns "int64" on the first read. After applying "to_pandas" to the pyarrow table, subsequent "read_table" calls of the same parquet file in the same *Python session* return "ArrowPeriodType"
> {code:java}
> import pyarrow
> import pyarrow.parquet
> pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[1]: [DataType(int64)]
> print(pq_table.to_pandas())
> # Out[2]:
> # col
> # 0 2010-01
> pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[3]: [ArrowPeriodType(DataType(int64))]
> pq_table = pyarrow.parquet.read_table("s3://my-bucket/my-prefix/period.parquet")
> print(pq_table.schema.types)
> # Out[4]: [ArrowPeriodType(DataType(int64))]{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)