You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dragoș Moldovan-Grünfeld (Jira)" <ji...@apache.org> on 2022/03/10 18:37:00 UTC
[jira] [Updated] (ARROW-15912) [C++] Is CSV reader's TimestampParser usable elsewhere?
[ https://issues.apache.org/jira/browse/ARROW-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dragoș Moldovan-Grünfeld updated ARROW-15912:
---------------------------------------------
Description:
The {{TimestampParser}} is be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way.
{code:r}
library(arrow)
library(fs)
library(readr)
library(tibble)
tf <- fs::file_temp(ext = "csv")
fs::file_create(tf)
sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
write_csv(sample_times, tf)
read_csv_arrow(tf,
as_data_frame = TRUE,
timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
#> # A tibble: 5 × 1
#> a
#> <dttm>
#> 1 2013-09-13 01:00:00
#> 2 1998-12-25 00:00:00
#> 3 2013-09-13 01:00:00
#> 4 2022-02-23 00:00:00
#> 5 2018-09-13 01:00:00
{code}
For example, in lubridate, the {{ymd()}} cycles through all possible formats that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).
I guess my question is: Can we factor this CSV reader feature to be usable elsewhere?
was:
The {{TimestampParser}} is be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way.
{code:r}
library(arrow)
library(fs)
library(readr)
library(tibble)
tf <- fs::file_temp(ext = "csv")
fs::file_create(tf)
sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
write_csv(sample_times, tf)
read_csv_arrow(tf,
as_data_frame = TRUE,
timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
#> # A tibble: 5 × 1
#> a
#> <dttm>
#> 1 2013-09-13 01:00:00
#> 2 1998-12-25 00:00:00
#> 3 2013-09-13 01:00:00
#> 4 2022-02-23 00:00:00
#> 5 2018-09-13 01:00:00
{code}
For example, in lubridate, the {{ymd()}} cycles through all possible formats that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).
I guess my question is can we factor this CSV reader feature to be usable elsewhere?
> [C++] Is CSV reader's TimestampParser usable elsewhere?
> -------------------------------------------------------
>
> Key: ARROW-15912
> URL: https://issues.apache.org/jira/browse/ARROW-15912
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Dragoș Moldovan-Grünfeld
> Priority: Major
>
> The {{TimestampParser}} is be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way.
> {code:r}
> library(arrow)
> library(fs)
> library(readr)
> library(tibble)
> tf <- fs::file_temp(ext = "csv")
> fs::file_create(tf)
> sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
> write_csv(sample_times, tf)
> read_csv_arrow(tf,
> as_data_frame = TRUE,
> timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
> #> # A tibble: 5 × 1
> #> a
> #> <dttm>
> #> 1 2013-09-13 01:00:00
> #> 2 1998-12-25 00:00:00
> #> 3 2013-09-13 01:00:00
> #> 4 2022-02-23 00:00:00
> #> 5 2018-09-13 01:00:00
> {code}
> For example, in lubridate, the {{ymd()}} cycles through all possible formats that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).
> I guess my question is: Can we factor this CSV reader feature to be usable elsewhere?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)