You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Olaf (JIRA)" <ji...@apache.org> on 2018/10/17 15:23:00 UTC
[jira] [Created] (ARROW-3543) crazy timestamp bug in feather?
Olaf created ARROW-3543:
---------------------------
Summary: crazy timestamp bug in feather?
Key: ARROW-3543
URL: https://issues.apache.org/jira/browse/ARROW-3543
Project: Apache Arrow
Issue Type: Bug
Reporter: Olaf
Hello the dream team,
Pasting from https://github.com/wesm/feather/issues/351
Thanks for this wonderful package. I was playing with feather and some timestamps and I noticed some dangerous behavior. Maybe it is a bug.
Consider this
```
import pandas as pd
import feather
import numpy as np
df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'),
pd.to_datetime('2018-02-01 14:01:00.456'),
pd.to_datetime('2018-03-05 14:01:02.200')]})
df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
df
Out[17]:
string_time_utc timestamp_est
0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
```
Here I create the corresponding `EST` timestamp of my original timestamps (in `UTC` time).
Now saving the dataframe to `csv` or to `feather` will generate two completely different results.
```
df.to_csv('P://testing.csv')
df.to_feather('P://testing.feather')
```
Switching to R.
Using the good old `csv` gives me something a bit annoying, but expected. R thinks my timezone is `UTC` by default, and wrongly attached this timezone to `timestamp_est`. No big deal, I can always use `with_tz` or even better: import as character and process as timestamp while in R.
```
> dataframe <- read_csv('P://testing.csv')
Parsed with column specification:
cols(
X1 = col_integer(),
string_time_utc = col_datetime(format = ""),
timestamp_est = col_datetime(format = "")
)
Warning message:
Missing column names filled in: 'X1' [1]
>
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 4
X1 string_time_utc timestamp_est
<int> <dttm> <dttm>
1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
mytimezone
<chr>
1 UTC
2 UTC
3 UTC
```
Now look at what happens with `feather`:
```
> dataframe <- read_feather('P://testing.feather')
>
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 3
string_time_utc timestamp_est mytimezone
<dttm> <dttm> <chr>
1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 ""
2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 ""
3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 ""
```
My timestamps have been converted!!! pure insanity.
Am I missing something here?
Thanks!!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)