You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/20 22:09:38 UTC

[GitHub] [arrow] dragosmg commented on pull request #13196: ARROW-16407: [R] Extend `parse_date_time` to cover hour, dates, and minutes components

dragosmg commented on PR #13196:
URL: https://github.com/apache/arrow/pull/13196#issuecomment-1160827641

   Results of benchmarking `parse_date_time()` implemented with combined formats (with and without separator) vs separate formats (either with or without separator)
   ```r
   library(dplyr)
   library(lubridate)
   library(ggplot2)
   library(hrbrthemes)
   load_all()
   
   test_df <- tibble::tibble(
     a = rep(c("20220614", "2022-06-14"), 1e6)
   )
   
   results <- bench::mark(
     separate = test_df %>% 
       arrow_table() %>% 
       mutate(b = parse_date_time(a, orders = "ymd")) %>% 
       collect(),
     combined = test_df %>% 
       arrow_table() %>% 
       mutate(b = parse_date_time_combined(a, orders = "ymd")) %>% 
       collect(), 
     min_iterations = 20
   )
   
   results
   
   # A tibble: 2 × 13
     expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result   memory     time       gc      
     <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>   <list>     <list>     <list>  
   1 separate      5.93s    5.94s    0.168     15.8MB   0.0720    14     6      1.39m <tibble> <Rprofmem> <bench_tm> <tibble>
   2 combined     12.22s   12.25s    0.0815    16.2MB   0.0439    13     7      2.66m <tibble> <Rprofmem> <bench_tm> <tibble>
   
   ggplot2::autoplot(results) +
     theme_ipsum_rc(grid = "XxY") +
     labs(title = "Comparison of format parsing",
          subtitle = 
            "separate = formats with or without separator are tried separately\n
   combined = formats are combined in a single vector and all are passed to `coalesce()`")
   ```
   
   ![image](https://user-images.githubusercontent.com/13176361/174673234-99592af2-43ed-4646-8890-2c794adf70f2.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org