You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2022/07/05 08:26:00 UTC

[jira] [Comment Edited] (ARROW-16871) [R] Implement exp() and sqrt() in Arrow dplyr queries

    [ https://issues.apache.org/jira/browse/ARROW-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562468#comment-17562468 ] 

Nicola Crane edited comment on ARROW-16871 at 7/5/22 8:25 AM:
--------------------------------------------------------------

Nice work digging into that there - and good find that it doesn't exist.  We should probably open a C++ ticket for implementing {{exp}} but in the meantime you could take advantage of the fact that  exp(1)== exp(1) ^x  so you could write a binding for {{exp()}} which works out exp(1) in R, and then uses the {{power_checked}} function in Arrow to work out the value you need.

Here's an example that kinda manages to do that without the bindings:
{code:r}
library(dplyr, warn.conflicts = FALSE)
library(arrow, warn.conflicts = FALSE)  df = tibble::tibble(
  x = c(1, 2, 3, 4, 5)
)

# calculate exp in R
df_exp <- mutate(
  df,
  exp_x = exp(x),
  exp_x_alt = exp(1) ^ x
)

# same thing
df_exp
#> # A tibble: 5 × 3
#>       x  exp_x exp_x_alt
#>   <dbl>  <dbl>     <dbl>
#> 1     1   2.72      2.72
#> 2     2   7.39      7.39
#> 3     3  20.1      20.1 
#> 4     4  54.6      54.6 
#> 5     5 148.      148.

# same result
arrow_table(df_exp) %>%
  mutate(exp_x_alt_arrow = exp(1)^x) %>% 
  collect()
#> # A tibble: 5 × 4
#>       x  exp_x exp_x_alt exp_x_alt_arrow
#>   <dbl>  <dbl>     <dbl>           <dbl>
#> 1     1   2.72      2.72            2.72
#> 2     2   7.39      7.39            7.39
#> 3     3  20.1      20.1            20.1 
#> 4     4  54.6      54.6            54.6 
#> 5     5 148.      148.            148.

# let's look at the Arrow expression
# Arrow has taken the value of exp(1) from R here so it works even though
# we have no exp function in Arrow
arrow_table(df_exp) %>%
  mutate(exp_x_manual = exp(1)^x)
#> Table (query)
#> x: double
#> exp_x: double
#> exp_x_alt: double
#> exp_x_manual: double (power_checked(2.718281828459045, x))
#> 
#> See $.data for the source Arrow object


{code}


was (Author: thisisnic):
Nice work digging into that there - and good find that it doesn't exist.  We should probably open a C++ ticket for implementing {{exp}} but in the meantime you could take advantage of the fact that \{{ exp(x)== exp(1) ^x  }} so you could write a binding for {{exp()}} which works out exp(1) in R, and then uses the {{power_checked}} function in Arrow to work out the value you need.

Here's an example that kinda manages to do that without the bindings:

{code:r}
library(dplyr, warn.conflicts = FALSE)
library(arrow, warn.conflicts = FALSE)  df = tibble::tibble(
  x = c(1, 2, 3, 4, 5)
)

# calculate exp in R
df_exp <- mutate(
  df,
  exp_x = exp(x),
  exp_x_alt = exp(1) ^ x
)

# same thing
df_exp
#> # A tibble: 5 × 3
#>       x  exp_x exp_x_alt
#>   <dbl>  <dbl>     <dbl>
#> 1     1   2.72      2.72
#> 2     2   7.39      7.39
#> 3     3  20.1      20.1 
#> 4     4  54.6      54.6 
#> 5     5 148.      148.

# same result
arrow_table(df_exp) %>%
  mutate(exp_x_alt_arrow = exp(1)^x) %>% 
  collect()
#> # A tibble: 5 × 4
#>       x  exp_x exp_x_alt exp_x_alt_arrow
#>   <dbl>  <dbl>     <dbl>           <dbl>
#> 1     1   2.72      2.72            2.72
#> 2     2   7.39      7.39            7.39
#> 3     3  20.1      20.1            20.1 
#> 4     4  54.6      54.6            54.6 
#> 5     5 148.      148.            148.

# let's look at the Arrow expression
# Arrow has taken the value of exp(1) from R here so it works even though
# we have no exp function in Arrow
arrow_table(df_exp) %>%
  mutate(exp_x_manual = exp(1)^x)
#> Table (query)
#> x: double
#> exp_x: double
#> exp_x_alt: double
#> exp_x_manual: double (power_checked(2.718281828459045, x))
#> 
#> See $.data for the source Arrow object


{code}

> [R] Implement exp() and sqrt() in Arrow dplyr queries
> -----------------------------------------------------
>
>                 Key: ARROW-16871
>                 URL: https://issues.apache.org/jira/browse/ARROW-16871
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>    Affects Versions: 8.0.0
>         Environment: Windows 10, R 4.2.0, RStudio 2022.02.3 Build 492
>            Reporter: Chris Higgins
>            Priority: Major
>              Labels: good-first-issue
>
> The change log for v8.0.0 notes that functions like exp(), log() and sqrt() can be used in R. I have been trying to calculate exp(x) on a field in a dataset, but this does not work. I _can_ calculate exp(10) on just some number, and I can raise to e manually (2.71828^(x)). Here's some basic reproducible code:
> {code:java}
> ```{r}
> test_df <- data.frame(x = c(1, 2, 3, 4, 5), y = c(1, 2, 3, 4, 5))
> write_dataset(test_df, "./test_ds")
> test_ds <- open_dataset("./test_ds")
> # does not work
> test_ds %>% mutate(z = exp(x)) %>% collect()
> # does work
> test_ds %>% mutate(z = 2.71828^(x)) %>% collect()
> # also works
> test_ds %>% mutate(z = exp(10)) %>% collect()
> ``` {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)