You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Olaf (Jira)" <ji...@apache.org> on 2020/04/16 13:07:00 UTC

[jira] [Created] (ARROW-8482) critical timestamp bug!

Olaf created ARROW-8482:
---------------------------

             Summary: critical timestamp bug!
                 Key: ARROW-8482
                 URL: https://issues.apache.org/jira/browse/ARROW-8482
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python, R
            Reporter: Olaf


Hello there!

 

First of all, thanks for making parquet files a reality in *R* and *Python*. This is really great.

I found a very nasty bug when exchanging parquet files between the two platforms. Consider this.

 

 
{code:java}
import pandas as pd
import pyarrow.parquet as pq
import numpy as np

df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), 
 pd.to_datetime('2018-02-01 14:01:00.456'),
 pd.to_datetime('2018-03-05 14:01:02.200')]})
df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
df
Out[5]: 
 string_time_utc timestamp_est
0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
{code}
 

Now I simply write to disk

 
{code:java}
df.to_parquet('myparquet.pq')
{code}
 

And the use *R* to load it.

 
{code:java}

test <- read_parquet('myparquet.pq')
> test
# A tibble: 3 x 2
 string_time_utc timestamp_est 
 <dttm> <dttm> 
1 2018-02-01 09:00:00.530999 2018-02-01 04:00:00.530999
2 2018-02-01 09:01:00.456000 2018-02-01 04:01:00.456000
3 2018-03-05 09:01:02.200000 2018-03-05 04:01:02.200000
{code}
 

 

As you can see, the timestamps have been converted in the process. I first referenced this bug in feather but I still it is still there. This is a very dangerous, silent bug.

 

What do you think?

Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)