You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2022/08/22 08:00:05 UTC

[jira] [Created] (ARROW-17490) [R] Differing results in log bindings

Nicola Crane created ARROW-17490:
------------------------------------

             Summary: [R] Differing results in log bindings
                 Key: ARROW-17490
                 URL: https://issues.apache.org/jira/browse/ARROW-17490
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Nicola Crane


We get different results for dplyr versus Acero if we call log on a column that contains 0, i.e.

{code:r}
``` r
library(arrow)
library(dplyr)

df <- tibble(x = 0:10)

df %>%
  mutate(y = log(x)) %>%
  collect()
#> # A tibble: 11 × 2
#>        x        y
#>    <int>    <dbl>
#>  1     0 -Inf    
#>  2     1    0    
#>  3     2    0.693
#>  4     3    1.10 
#>  5     4    1.39 
#>  6     5    1.61 
#>  7     6    1.79 
#>  8     7    1.95 
#>  9     8    2.08 
#> 10     9    2.20 
#> 11    10    2.30

df %>%
  arrow_table() %>%
  mutate(y = log(x)) %>%
  collect()
#> Error in `collect()`:
#> ! Invalid: logarithm of zero
```

{code}

This is because R defines {{log(0)}} as {{-Inf}} whereas Acero defines it as an error.  Not sure what the solution is here; do we want to request the addition of an Acero option to define behaviour for this?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)