You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dragoș Moldovan-Grünfeld (Jira)" <ji...@apache.org> on 2022/03/10 18:37:00 UTC

[jira] [Updated] (ARROW-15912) [C++] Is CSV reader's TimestampParser usable elsewhere?

     [ https://issues.apache.org/jira/browse/ARROW-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dragoș Moldovan-Grünfeld updated ARROW-15912:
---------------------------------------------
    Description: 
The {{TimestampParser}} is be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way. 

{code:r}
library(arrow)
library(fs)
library(readr)
library(tibble)

tf <- fs::file_temp(ext = "csv")
fs::file_create(tf)

sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
write_csv(sample_times, tf)


read_csv_arrow(tf, 
               as_data_frame = TRUE,
               timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
#> # A tibble: 5 × 1
#>   a                  
#>   <dttm>             
#> 1 2013-09-13 01:00:00
#> 2 1998-12-25 00:00:00
#> 3 2013-09-13 01:00:00
#> 4 2022-02-23 00:00:00
#> 5 2018-09-13 01:00:00
{code}

For example, in lubridate, the {{ymd()}} cycles through all possible formats that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).   

I guess my question is: Can we factor this CSV reader feature to be usable elsewhere? 

  was:
The {{TimestampParser}} is be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way. 

{code:r}
library(arrow)
library(fs)
library(readr)
library(tibble)

tf <- fs::file_temp(ext = "csv")
fs::file_create(tf)

sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
write_csv(sample_times, tf)


read_csv_arrow(tf, 
               as_data_frame = TRUE,
               timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
#> # A tibble: 5 × 1
#>   a                  
#>   <dttm>             
#> 1 2013-09-13 01:00:00
#> 2 1998-12-25 00:00:00
#> 3 2013-09-13 01:00:00
#> 4 2022-02-23 00:00:00
#> 5 2018-09-13 01:00:00
{code}

For example, in lubridate, the {{ymd()}} cycles through all possible formats that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).   

I guess my question is can we factor this CSV reader feature to be usable elsewhere? 


> [C++] Is CSV reader's TimestampParser usable elsewhere?
> -------------------------------------------------------
>
>                 Key: ARROW-15912
>                 URL: https://issues.apache.org/jira/browse/ARROW-15912
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Dragoș Moldovan-Grünfeld
>            Priority: Major
>
> The {{TimestampParser}} is be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way. 
> {code:r}
> library(arrow)
> library(fs)
> library(readr)
> library(tibble)
> tf <- fs::file_temp(ext = "csv")
> fs::file_create(tf)
> sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
> write_csv(sample_times, tf)
> read_csv_arrow(tf, 
>                as_data_frame = TRUE,
>                timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
> #> # A tibble: 5 × 1
> #>   a                  
> #>   <dttm>             
> #> 1 2013-09-13 01:00:00
> #> 2 1998-12-25 00:00:00
> #> 3 2013-09-13 01:00:00
> #> 4 2022-02-23 00:00:00
> #> 5 2018-09-13 01:00:00
> {code}
> For example, in lubridate, the {{ymd()}} cycles through all possible formats that have year-month-date components in the right order (e.g. {{"%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d"}}, etc).   
> I guess my question is: Can we factor this CSV reader feature to be usable elsewhere? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)