You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/07/09 22:08:00 UTC

[jira] [Comment Edited] (ARROW-5895) [Python] New version stores timestamps as epoch ms instead of ISO timestamp string

    [ https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881585#comment-16881585 ] 

Joris Van den Bossche edited comment on ARROW-5895 at 7/9/19 10:07 PM:
-----------------------------------------------------------------------

So what changed in 0.14.0 compared to 0.13 is that timestamp columns are now also annotated with the new LogicalType (eg {{TIMESTAMP(unit=MICROS)}}) in addition to the older ConvertedType ({{TIMESTAMP_MILLIS/MICROS}}. However, there are some compatibility problems where the older ConvertedType is omitted for tz-naive data (see ARROW-5878). 

Could you try with timezone aware data to check if you are encountering the same issue? Because it might be that the S3 parquet reader does not yet understand the new LogicalTypes, and thus the absence of the ConvertedType annotation could lead to interpreting it as just integers (as you see in the output)

I don't think there is an option to *not* write those new LogicalTypes, but the omission of the ConvertedType annotation is a bug that should be fixed for 0.14.1.



was (Author: jorisvandenbossche):
So what changed in 0.14.0 compared to 0.13 is that timestamp columns are now also annotated with the new LogicalType (eg {{TIMESTAMP(unit=MICROS)}}) in addition to the older ConvertedType ({{TIMESTAMP_MILLIS/MICROS}}. However, there are some compatibility problems where the older ConvertedType is omitted for tz-naive data (see ARROW-5889). 

Could you try with timezone aware data to check if you are encountering the same issue? Because it might be that the S3 parquet reader does not yet understand the new LogicalTypes, and thus the absence of the ConvertedType annotation could lead to interpreting it as just integers (as you see in the output)

I don't think there is an option to *not* write those new LogicalTypes, but the omission of the ConvertedType annotation is a bug that should be fixed for 0.14.1.


> [Python] New version stores timestamps as epoch ms instead of ISO timestamp string
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-5895
>                 URL: https://issues.apache.org/jira/browse/ARROW-5895
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>         Environment: Linux dev.office.whoop.com 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: John Wilson
>            Priority: Major
>
> Just upgraded from pyarrow 0.13 to 0.14.
> Columns of type TimestampType(timestmap[ns]) now get written as epoch ms values: 
> 1561939200507
> Where 0.13 wrote TimestampType(timestamp[ns]) as an ISO string:
> 2019-07-01T00:00:00.507Z
> This broke my implementation.  How do I get pyarrow to write ISO strings again in 0.14?
>  
> Here is my table write:
> {{ pyarrow.parquet.write_to_dataset(table=tbl, root_path=local_path,}}
> {{ partition_cols=['env', 'dt'],}}
> {{ coerce_timestamps='ms',}}
> {{ allow_truncated_timestamps=True,}}
> {{ version='2.0',}}
> {{ compression='SNAPPY')}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)