You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Shannon C Lewis (Jira)" <ji...@apache.org> on 2019/09/13 22:19:00 UTC
[jira] [Comment Edited] (ARROW-3543) [R] Time zone adjustment issue
when reading Feather file written by Python
[ https://issues.apache.org/jira/browse/ARROW-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929565#comment-16929565 ]
Shannon C Lewis edited comment on ARROW-3543 at 9/13/19 10:18 PM:
------------------------------------------------------------------
Hi Neal,
Sorry for the formatting....yes it seems that read_delim_arrow detects it is a POSIXct, format....
I tried another test and split the datetime field into date and time fields.
Now the date field is a POSIXct, format, and the time is a character.
But the date field is one day before and at time of 17:00:00:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 7 variables:
$ Date : POSIXct, format: "2019-09-10 17:00:00" "2019-09-10 17:00:00" "2019-09-10 17:00:00" "2019-09-10 17:00:00" ...
$ Time : chr "21:36:22" "21:36:22" "22:43:58" "22:43:58" ...
$ LogLevel : chr "[INFO]" "[INFO]" "[INFO]" "[INFO]" ...
tbl_df:
Date Time LogLevel Type FunctionName Message UserName
1 *{color:#ff0000}2019-09-10 17:00:00{color}* 21:36:22 [INFO] CentralLogger CentralLogger Initialized shannon.lewis
2 2019-09-10 17:00:00 21:36:22 [INFO] Controller Controller.R Initialized shannon.lewis
Logfile:
Date|Time|LogLevel|Type|FunctionName|Message|UserName
*{color:#00875a}2019-09-11{color}*|21:36:22|[INFO]|CentralLogger| |CentralLogger Initialized|shannon.lewis
2019-09-11|21:36:22|[INFO]|Controller| |Controller.R Initialized|shannon.lewis
Shannon
was (Author: shannon.lewis):
Hi Neal,
Sorry for the formatting....yes it seems that read_delim_arrow detects it is a POSIXct, format....
I tried another test and split the datetime field into date and time fields.
Now the date field is a POSIXct, format, and the time is a character.
But the date field the day before and at time of 17:00:00:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 7 variables:
$ Date : POSIXct, format: "2019-09-10 17:00:00" "2019-09-10 17:00:00" "2019-09-10 17:00:00" "2019-09-10 17:00:00" ...
$ Time : chr "21:36:22" "21:36:22" "22:43:58" "22:43:58" ...
$ LogLevel : chr "[INFO]" "[INFO]" "[INFO]" "[INFO]" ...
tbl_df:
Date Time LogLevel Type FunctionName Message UserName
1 *{color:#FF0000}2019-09-10 17:00:00{color}* 21:36:22 [INFO] CentralLogger CentralLogger Initialized shannon.lewis
2 2019-09-10 17:00:00 21:36:22 [INFO] Controller Controller.R Initialized shannon.lewis
Logfile:
Date|Time|LogLevel|Type|FunctionName|Message|UserName
*{color:#00875a}2019-09-11{color}*|21:36:22|[INFO]|CentralLogger| |CentralLogger Initialized|shannon.lewis
2019-09-11|21:36:22|[INFO]|Controller| |Controller.R Initialized|shannon.lewis
Shannon
> [R] Time zone adjustment issue when reading Feather file written by Python
> --------------------------------------------------------------------------
>
> Key: ARROW-3543
> URL: https://issues.apache.org/jira/browse/ARROW-3543
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Olaf
> Priority: Major
> Fix For: 1.0.0
>
>
> Hello the dream team,
> Pasting from [https://github.com/wesm/feather/issues/351]
> Thanks for this wonderful package. I was playing with feather and some timestamps and I noticed some dangerous behavior. Maybe it is a bug.
> Consider this
>
> {code:java}
> import pandas as pd
> import feather
> import numpy as np
> df = pd.DataFrame(
> {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 14:01:02.200')]}
> )
> df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
> df
> Out[17]:
> string_time_utc timestamp_est
> 0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
> 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
> 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> {code}
> Here I create the corresponding `EST` timestamp of my original timestamps (in `UTC` time).
> Now saving the dataframe to `csv` or to `feather` will generate two completely different results.
>
> {code:java}
> df.to_csv('P://testing.csv')
> df.to_feather('P://testing.feather')
> {code}
> Switching to R.
> Using the good old `csv` gives me something a bit annoying, but expected. R thinks my timezone is `UTC` by default, and wrongly attached this timezone to `timestamp_est`. No big deal, I can always use `with_tz` or even better: import as character and process as timestamp while in R.
>
> {code:java}
> > dataframe <- read_csv('P://testing.csv')
> Parsed with column specification:
> cols(
> X1 = col_integer(),
> string_time_utc = col_datetime(format = ""),
> timestamp_est = col_datetime(format = "")
> )
> Warning message:
> Missing column names filled in: 'X1' [1]
> >
> > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 4
> X1 string_time_utc timestamp_est
> <int> <dttm> <dttm>
> 1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
> 2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
> 3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
> mytimezone
> <chr>
> 1 UTC
> 2 UTC
> 3 UTC {code}
> {code:java}
> #Now look at what happens with feather:
>
> > dataframe <- read_feather('P://testing.feather')
> >
> > dataframe %>% mutate(mytimezone = tz(timestamp_est))
> A tibble: 3 x 3
> string_time_utc timestamp_est mytimezone
> <dttm> <dttm> <chr>
> 1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 ""
> 2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 ""
> 3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" {code}
> My timestamps have been converted!!! pure insanity.
> Am I missing something here?
> Thanks!!
--
This message was sent by Atlassian Jira
(v8.3.2#803003)