You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/15 11:16:41 UTC

[GitHub] [arrow] dragosmg opened a new pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

dragosmg opened a new pull request #12429:
URL: https://github.com/apache/arrow/pull/12429


   Once this PR is merged the following code snippet will be valid
   
   ``` r
   suppressPackageStartupMessages(library(dplyr))
   suppressPackageStartupMessages(library(lubridate))
   suppressPackageStartupMessages(library(arrow))
   
   df <- tibble(
     dates = as.Date(c("2021-04-30", "2021-07-31", "2021-10-31", "2021-01-31"))
     )
   df %>% 
     record_batch() %>% 
     mutate(
       semester_without_year = semester(dates),
       semester_with_year = semester(dates, with_year = TRUE)) %>% 
     collect()
   #> # A tibble: 4 × 3
   #>   dates      semester_without_year semester_with_year
   #>   <date>                     <int>              <dbl>
   #> 1 2021-04-30                     1              2021.1
   #> 2 2021-07-31                     2              2021.2
   #> 3 2021-10-31                     2              2021.2
   #> 4 2021-01-31                     1              2021.1
   ```
   
   and identical with the `lubridate` result:
   ``` r
   suppressPackageStartupMessages(library(dplyr))
   suppressPackageStartupMessages(library(lubridate))
   
   df <- tibble(
     dates = as.Date(c("2021-04-30", "2021-07-31", "2021-10-31", "2021-01-31"))
     )
   df %>% 
     mutate(
       semester_without_year = semester(dates),
       semester_with_year = semester(dates, with_year = TRUE))
   #> # A tibble: 4 × 3
   #>   dates      semester_without_year semester_with_year
   #>   <date>                     <int>              <dbl>
   #> 1 2021-04-30                     1              2021.1
   #> 2 2021-07-31                     2              2021.2
   #> 3 2021-10-31                     2              2021.2
   #> 4 2021-01-31                     1              2021.1
   ```
   
   <sup>Created on 2022-02-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r807300733



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -147,5 +147,14 @@ register_bindings_datetime <- function() {
   register_binding("pm", function(x) {
     !call_binding("am", x)
   })
-
+  register_binding("semester", function(x, with_year = FALSE) {
+    month <- call_binding("month", x)
+    semester <- call_binding("if_else", month <= 6, 1L, 2L)
+    if (with_year) {
+      year <- call_binding("year", x)
+      return(year + semester / 10)

Review comment:
       Yes, initially I implemented it as a string, as at least that felt like adding a guardrail, but nope, it's numeric.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r808184060



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,48 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # test extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+})
+
+test_that("semester errors with integers and characters", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # extraction from integers should error as we currently do not support setting
+  # month components with month, but this is supported by `lubridate::month()`
+  # this should no longer fail once https://issues.apache.org/jira/browse/ARROW-15701
+  # is addressed

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r810506760



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,
+          x,
+          abort("bla: Values are not in 1:12")
+        )

Review comment:
       I suspect in this part of the branch what you'll need to do is (re) assign the `call_binding` to `x` or return it directly. (thought see the comment below that this probably needs to happen above or we'll need to take into account that `label` argument.

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,
+          x,
+          abort("bla: Values are not in 1:12")
+        )
+      } else {
+        if (all(1 <= x & x <= 12)) {
+          return(x)
+        } else {
+          abort("bla2: Values are not in 1:12")
+        }
+      }
+    }

Review comment:
       Thanks for this, this is on the right track. I suspect that we might want this to be a bit higher in the function call, since we can use numerics and labels in lubridate:
   
   ```
   lubridate::month(2, label = TRUE)
   [1] Feb
   12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
   ```
   
   Could you also add some tests for `month()` taking various inputs here?
   
   A few inputs that I would start with (though I'm sure there are a bunch that could get us into funny corners!):
   
   ```
   month(1)
   month(1.1)
   month(4L)
   month(4L, label = TRUE)
   ```
   
   You might have already figured this out (in which case ignore this!), but an understanding of this will help you work on these bindings:
   
   We have a few of these helper functions in the package:
   
   * `call_function()` — this is exported and is a way to evaluate a specific arrow C++ compute function. This is helpful for answering questions like "Does this C++ compute function support this input? What does the error look like when it doesn?"
   * `call_binding()` — (which is unexpected) makes it easy to call one of our dplyr bindings, but note that `call_binding()` creates an Arrow Expression, but doesn't actually evaluate it (since evaluating it might be expensive, that is deferred to when the query is actually running). This is how we construct binding calls that rely on other bindings. Running this standalone will _not_ evaluate on the data, but running it alone is good for catching type-checking errors.
   * `arrow_eval()` — this will trigger actual evaluation (and is used inside of our dplyr evaluation code)
   
   So when you're experimenting you might call something like `arrow_eval(call_binding("month", Expression$scalar(1L)), arrow_mask(list()))` to see what happens if we were to call and evaluate our `month` dplyr binding. The `arrow_mask(list())` at the end is because `arrow_eval()` is a tidy-eval style that one needs to provide a data mask.
   
   One last thing: for this binding, since the behavior depends (at least in some cases) on the actual data in the column: we might want to make a (test) helper that wraps `arrow_eval(..., arrow_mask(list()))` that makes it easier to push various kinds of data through one of our dplyr bindings and see the output (either the values or the errors). It's possible something like this exists (honestly, in some ways a dplyr pipeline kind of _is_ that wrapper — just with a lot more) but I'm not aware of a quick helper that makes constructing + triggering the bindings with various inputs easy.

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,40 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types and integers", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = sprintf("%02i", month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # semester extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+  # semester extraction from months as integers
+  compare_dplyr_binding(
+    .input %>%
+      mutate(sem_month_as_int = semester(month_as_int)) %>%
+      collect(),
+    test_df
+  )
+
+  expect_error(
+    call_binding("semester", Expression$scalar(1:13))
+  )

Review comment:
       Could you assert the error that you're getting here?

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,

Review comment:
       I suspect that the `$data` isn't doing what you expect here. That's actually getting the _input_ to the `all` binding (not the result).
   
   We also should be careful that we don't want to go through the data more than once — is there a way we can construct a series of binding calls like this that get us to the answer (and then error if they run into a value that won't work?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1048171598


   Benchmark runs are scheduled for baseline = 5680d209fd870f99134e2d7299b47acd90fabb8e and contender = 16f36a57abb9c2b4b54e914e3b2181e83453355b. 16f36a57abb9c2b4b54e914e3b2181e83453355b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/99142f29b01b47bc963c702b87da051a...f16ecd0390054e1ea26a000a139f9788/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/241f27fc813d4ecb96c234fd1bd99adc...b7aeb10131994fc798bcf7ea16dab145/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6972f96314454d3f87792b1d654b082f...64de6ec55164418ba85982637a591af3/)
   [Finished :arrow_down:0.34% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6624357494094dda81f375a5a493c3d2...5c967704496a430c9c01b74cf2587be6/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1048171598


   Benchmark runs are scheduled for baseline = 5680d209fd870f99134e2d7299b47acd90fabb8e and contender = 16f36a57abb9c2b4b54e914e3b2181e83453355b. 16f36a57abb9c2b4b54e914e3b2181e83453355b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/99142f29b01b47bc963c702b87da051a...f16ecd0390054e1ea26a000a139f9788/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/241f27fc813d4ecb96c234fd1bd99adc...b7aeb10131994fc798bcf7ea16dab145/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6972f96314454d3f87792b1d654b082f...64de6ec55164418ba85982637a591af3/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6624357494094dda81f375a5a493c3d2...5c967704496a430c9c01b74cf2587be6/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1040150947






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1048171598


   Benchmark runs are scheduled for baseline = 5680d209fd870f99134e2d7299b47acd90fabb8e and contender = 16f36a57abb9c2b4b54e914e3b2181e83453355b. 16f36a57abb9c2b4b54e914e3b2181e83453355b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/99142f29b01b47bc963c702b87da051a...f16ecd0390054e1ea26a000a139f9788/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/241f27fc813d4ecb96c234fd1bd99adc...b7aeb10131994fc798bcf7ea16dab145/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6972f96314454d3f87792b1d654b082f...64de6ec55164418ba85982637a591af3/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6624357494094dda81f375a5a493c3d2...5c967704496a430c9c01b74cf2587be6/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1041378890


   Not sure if we should (and if _yes_, at what level) support integer / numeric arguments to `month()`. If the answer is _yes_, I feel that should not happen only inside / for semester. `lubridate::month()` has dual purpose - it can be used both to get and set months. arrow currently supports only extraction and, therefore, the input type needs to be `temporal`.   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r811219485



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -115,6 +115,23 @@ register_bindings_datetime <- function() {
       return(Expression$create("strftime", x, options = list(format = format, locale = locale)))
     }
 
+    if (call_binding("is.integer", x)) {
+      if (inherits(x, "Expression")) {
+        call_binding(
+          "if_else",
+          call_binding_agg("all", call_binding("between", x, 1, 12))$data,
+          x,
+          abort("bla: Values are not in 1:12")
+        )
+      } else {
+        if (all(1 <= x & x <= 12)) {
+          return(x)
+        } else {
+          abort("bla2: Values are not in 1:12")
+        }
+      }
+    }

Review comment:
       I have just spent half a day looking at the unit tests for `month()` and trying to figure out how to solve both `semester()` and `month()` in one go. I think solving all the outstanding issues with `month` (integer inputs, factor outputs etc.) is maybe a bit too much to accomplish in this PR. Therefore, I think the best course of action would be to keep the two issues separate - solve `semester()` in this PR and then, at a later date, iron any outstanding aspects of `month()` as part of [ARROW-15701](https://issues.apache.org/jira/browse/ARROW-15701).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r807965478



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))

Review comment:
       I added those as failing tests. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r809093956



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))
+  )
+
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )

Review comment:
       @jonkeane I decided to implement support for integer inputs in `month()`, but I do not have a working version yet.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r807965478



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))

Review comment:
       I've added those as failing tests. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r807278266



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -147,5 +147,14 @@ register_bindings_datetime <- function() {
   register_binding("pm", function(x) {
     !call_binding("am", x)
   })
-
+  register_binding("semester", function(x, with_year = FALSE) {
+    month <- call_binding("month", x)
+    semester <- call_binding("if_else", month <= 6, 1L, 2L)
+    if (with_year) {
+      year <- call_binding("year", x)
+      return(year + semester / 10)

Review comment:
       😱 This looks totally wrong, but it isn't. It's exactly what lubridate's semseter/quarter returns: a numeric where the decimal portion is the quarter or semester number (and not a decimal representation of such, or a string).

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))

Review comment:
       ```suggestion
       month_as_int = c(1:12, NA),
       month_as_char_pad = ifelse(month < 10, paste0("0", month), month),
       dates = as.Date(paste0("2021-", month_char_pad, "-15"))
   ```
   
   and then also test `month_as_int` and `month_as_str_pad` below too

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))
+  )
+
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )

Review comment:
       Would you mind also testing some bad / wrong inputs and confirming that the errors are what we expect? Something like:
   
   ```
   expect_error(call_binding("semester", "not a month"), "...")
   ```
   
   And other inputs like `22`, `-4`, `NA`, etc. that are clearly out of bounds / won't coerce into a month? Alternatively (or even better...) might be to add these tests to the tests for the `month` binding since that's where that coercion happens, it will be used when we make a binding for `quarters` and it appears those tests weren't added (or I can't find them!) when we added `months`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane closed pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #12429:
URL: https://github.com/apache/arrow/pull/12429


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r808112466



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,48 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )

Review comment:
       I see this repeated down below as well and we don't use all of the inputs here. We should either curate the inputs so that this tibble has all and only the columns that are used in this block (and same with the one below) or we should factor these out + put them up top or somewhere else such that we define it once and then use it in multiple `test_that` blocks.

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,48 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # test extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+})
+
+test_that("semester errors with integers and characters", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # extraction from integers should error as we currently do not support setting
+  # month components with month, but this is supported by `lubridate::month()`
+  # this should no longer fail once https://issues.apache.org/jira/browse/ARROW-15701
+  # is addressed

Review comment:
       Thanks for adding this in as a comment here! It's minor, but makes it easier to find them later, I try and add a `# TODO: ...` line somewhere in these blocks (ideally on the line that has the jira ticket number)

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,48 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # test extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+})
+
+test_that("semester errors with integers and characters", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),

Review comment:
       ```suggestion
       month_as_char_pad = sprintf("%02i", month_as_int),
   ```
   
   This gets what you need here, yeah?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1040150947






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r807963638



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))
+  )
+
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )

Review comment:
       I think the direction for this would be for `month` to support integer inputs and then all the tests to happen there. I've added a failing test for integer inputs and I will make a note / add a comment to move the test to the "it works" block once we support integer inputs for month. I have opened [ARROW-15701](https://issues.apache.org/jira/browse/ARROW-15701).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r811156508



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,40 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types and integers", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = sprintf("%02i", month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # semester extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+  # semester extraction from months as integers
+  compare_dplyr_binding(
+    .input %>%
+      mutate(sem_month_as_int = semester(month_as_int)) %>%
+      collect(),
+    test_df
+  )
+
+  expect_error(
+    call_binding("semester", Expression$scalar(1:13))
+  )

Review comment:
       Nope. Not yet, I would like to get an error, but it at the moment it's just a placeholder since the function doesn't yet error / work correctly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r807300733



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -147,5 +147,14 @@ register_bindings_datetime <- function() {
   register_binding("pm", function(x) {
     !call_binding("am", x)
   })
-
+  register_binding("semester", function(x, with_year = FALSE) {
+    month <- call_binding("month", x)
+    semester <- call_binding("if_else", month <= 6, 1L, 2L)
+    if (with_year) {
+      year <- call_binding("year", x)
+      return(year + semester / 10)

Review comment:
       Yes, initially I implemented it as a string, as at least that felt like adding some sort of a guardrail, but nope, it's numeric.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r808102891



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,19 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester", {
+  test_df <- tibble(
+    month = c(1:12, NA),
+    month_char_pad = ifelse(month < 10, paste0("0", month), month),
+    dates = as.Date(paste0("2021-", month_char_pad, "-15"))
+  )
+
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )

Review comment:
       Yup, totally agree that `month` is the thing that should do this to accept a broader array of inputs + testing there. I also commented on the Jira with some ideas about how to move forward there




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dragosmg commented on a change in pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
dragosmg commented on a change in pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#discussion_r808183192



##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -711,3 +711,48 @@ test_that("am/pm mirror lubridate", {
   )
 
 })
+
+test_that("semester works with temporal types", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),
+    dates = as.Date(paste0("2021-", month_as_char_pad, "-15"))
+  )
+
+  # test extraction from dates
+  compare_dplyr_binding(
+     .input %>%
+      mutate(sem_wo_year = semester(dates),
+             sem_w_year = semester(dates, with_year = TRUE)) %>%
+      collect(),
+     test_df
+  )
+})
+
+test_that("semester errors with integers and characters", {
+  test_df <- tibble(
+    month_as_int = c(1:12, NA),
+    month_as_char_pad = ifelse(month_as_int < 10, paste0("0", month_as_int), month_as_int),

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1048171598


   Benchmark runs are scheduled for baseline = 5680d209fd870f99134e2d7299b47acd90fabb8e and contender = 16f36a57abb9c2b4b54e914e3b2181e83453355b. 16f36a57abb9c2b4b54e914e3b2181e83453355b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/99142f29b01b47bc963c702b87da051a...f16ecd0390054e1ea26a000a139f9788/)
   [Finished :arrow_down:0.46% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/241f27fc813d4ecb96c234fd1bd99adc...b7aeb10131994fc798bcf7ea16dab145/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6972f96314454d3f87792b1d654b082f...64de6ec55164418ba85982637a591af3/)
   [Finished :arrow_down:0.34% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6624357494094dda81f375a5a493c3d2...5c967704496a430c9c01b74cf2587be6/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12429: ARROW-14815 [R] bindings for `lubridate::semester()`

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12429:
URL: https://github.com/apache/arrow/pull/12429#issuecomment-1048171598


   Benchmark runs are scheduled for baseline = 5680d209fd870f99134e2d7299b47acd90fabb8e and contender = 16f36a57abb9c2b4b54e914e3b2181e83453355b. 16f36a57abb9c2b4b54e914e3b2181e83453355b is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/99142f29b01b47bc963c702b87da051a...f16ecd0390054e1ea26a000a139f9788/)
   [Finished :arrow_down:0.46% :arrow_up:0.04%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/241f27fc813d4ecb96c234fd1bd99adc...b7aeb10131994fc798bcf7ea16dab145/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6972f96314454d3f87792b1d654b082f...64de6ec55164418ba85982637a591af3/)
   [Finished :arrow_down:0.34% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6624357494094dda81f375a5a493c3d2...5c967704496a430c9c01b74cf2587be6/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org