You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "gongcastro (via GitHub)" <gi...@apache.org> on 2023/05/04 12:27:11 UTC

[GitHub] [arrow] gongcastro opened a new issue, #35431: [R] Error when creating a sequence with`n()`

gongcastro opened a new issue, #35431:
URL: https://github.com/apache/arrow/issues/35431

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   Hi! I wanted to create a variable in a data frame with the cumulative counts of some other variable. 
   
   Without using Arrow, I get what I need:
   
   ```r
   library(dplyr)
   library(tibble)
   
   mtcars |> 
     rownames_to_column("model") |>
     select(model, cyl) |> 
     group_by(cyl) |> 
     mutate(seq_counts = 1:n())
   ```
   
   Which returns:
   
   ```
   # A tibble: 32 × 3
      model               cyl seq_counts
      <chr>             <dbl>      <int>
    1 Mazda RX4             6          1
    2 Mazda RX4 Wag         6          2
    3 Datsun 710            4          1
    4 Hornet 4 Drive        6          3
    5 Hornet Sportabout     8          1
    6 Valiant               6          4
    7 Duster 360            8          2
    8 Merc 240D             4          2
    9 Merc 230              4          3
   10 Merc 280              6          5
   ```
   
   Since Arrow does not support `n()` yet, I'm using `to_duckdb()` to continue the pipeline (I'm using `mtcars` here for minimal reproducibility, but my actual dataset is way bigger, therefore the need to use Arrow/DuckDB). But when using the same code after `to_duckdb()`, I get the following error:
   
   ```r
   mtcars |> 
     rownames_to_column("model") |>
     to_duckdb() |>
     select(model, cyl) |> 
     group_by(cyl) |> 
     mutate(seq_counts = 1:n())
   ```
   
   ```
   Error in `purrr::pmap()`:
   ℹ In index: 3.
   ℹ With name: seq_counts.
   Caused by error in `from:to`:
   ! NA/NaN argument
   Run `rlang::last_trace()` to see where the error occurred.
   Warning message:
   In 1:n() : NAs introduced by coercion
   ```
   I encouter the same error when defining n() in a different variable (e.g., `mutate(n_total = n(), seq_counts = 1:n_total)`, and when using `seq()` instead of `:` to make the sequence.
   
   Thanks!
   
   This is my `sessionInfo()`:
   
   ```
   R version 4.2.2 (2022-10-31 ucrt)
   Platform: x86_64-w64-mingw32/x64 (64-bit)
   Running under: Windows 10 x64 (build 22621)
   
   Matrix products: default
   
   locale:
   [1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8
   [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
   [5] LC_TIME=Spanish_Spain.utf8
   
   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base
   
   other attached packages:
   [1] arrow_11.0.0.3 tibble_3.2.1   dplyr_1.1.2    devtools_2.4.3 usethis_2.1.5
   
   loaded via a namespace (and not attached):
    [1] pillar_1.9.0      compiler_4.2.2    dbplyr_2.1.1      prettyunits_1.1.1
    [5] remotes_2.4.2     tools_4.2.2       pkgbuild_1.3.1    pkgload_1.3.2
    [9] bit_4.0.5         memoise_2.0.1     lifecycle_1.0.3   pkgconfig_2.0.3
   [13] rlang_1.1.0       cli_3.6.0         DBI_1.1.3         fastmap_1.1.0
   [17] duckdb_0.7.1-1    withr_2.5.0       generics_0.1.3    fs_1.5.2
   [21] vctrs_0.6.2       bit64_4.0.5       tidyselect_1.2.0  glue_1.6.2
   [25] R6_2.5.1          processx_3.8.1    fansi_1.0.3       sessioninfo_1.2.2
   [29] callr_3.7.3       purrr_1.0.1       tzdb_0.3.0        blob_1.2.3
   [33] magrittr_2.0.3    ps_1.7.5          ellipsis_0.3.2    assertthat_0.2.1
   [37] utf8_1.2.2        cachem_1.0.6      crayon_1.5.2
   ```
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic closed issue #35431: [R] Error when creating a sequence with`n()`

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic closed issue #35431: [R] Error when creating a sequence with`n()`
URL: https://github.com/apache/arrow/issues/35431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on issue #35431: [R] Error when creating a sequence with`n()`

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on issue #35431:
URL: https://github.com/apache/arrow/issues/35431#issuecomment-1534913273

   Thanks for reporting this @gongcastro! Once you call `to_duckdb()`, this converts the object to a virtual DuckDB table, so the error you're having likely doesn't reside within the Arrow codebase, so you might be best opening up an issue on [the DuckDB repo](https://github.com/duckdb/duckdb/issues).
   
   I've pasted a reprex below which shows this error being recreated using just duckdb without arrow:
   
   ``` r
   library(duckdb)
   library(dplyr)
   
   # with dplyr
   mtcars %>%
     group_by(am) |>
     mutate(seq_counts = 1:n()) |>
     collect()
   #> # A tibble: 32 × 12
   #> # Groups:   am [2]
   #>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb seq_counts
   #>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>      <int>
   #>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4          1
   #>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4          2
   #>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1          3
   #>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1          1
   #>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2          2
   #>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1          3
   #>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4          4
   #>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2          5
   #>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2          6
   #> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4          7
   #> # ℹ 22 more rows
   
   # with DuckDB
   con <- dbConnect(duckdb::duckdb(), dbdir = ":memory:")
   duckdb::duckdb_register(con, "mtcars", mtcars)
   
   tbl(con, "mtcars") |>
     group_by(am) |>
     mutate(seq_counts = 1:n()) |>
     collect()
   #> Warning in 1:n(): NAs introduced by coercion
   #> Error in `purrr::pmap()`:
   #> ℹ In index: 2.
   #> Caused by error in `from:to`:
   #> ! NA/NaN argument
   #> Backtrace:
   #>      ▆
   #>   1. ├─dplyr::collect(mutate(group_by(tbl(con, "mtcars"), am), seq_counts = 1:n()))
   #>   2. ├─dbplyr:::collect.tbl_sql(...)
   #>   3. │ ├─dbplyr::db_sql_render(x$src$con, x, cte = cte)
   #>   4. │ └─dbplyr:::db_sql_render.DBIConnection(x$src$con, x, cte = cte)
   #>   5. │   ├─dbplyr::sql_render(sql, con = con, ..., cte = cte)
   #>   6. │   └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., cte = cte)
   #>   7. │     ├─dbplyr::sql_render(...)
   #>   8. │     └─dbplyr:::sql_render.lazy_query(...)
   #>   9. │       ├─dbplyr::sql_build(query, con = con, ...)
   #>  10. │       └─dbplyr:::sql_build.lazy_select_query(query, con = con, ...)
   #>  11. │         └─dbplyr:::get_select_sql(...)
   #>  12. │           └─dbplyr:::translate_select_sql(con, select)
   #>  13. │             └─purrr::pmap(...)
   #>  14. │               └─purrr:::pmap_("list", .l, .f, ..., .progress = .progress)
   #>  15. │                 ├─purrr:::with_indexed_errors(...)
   #>  16. │                 │ └─base::withCallingHandlers(...)
   #>  17. │                 └─dbplyr (local) .f(...)
   #>  18. │                   └─dbplyr::translate_sql_(...)
   #>  19. │                     └─base::lapply(...)
   #>  20. │                       └─dbplyr (local) FUN(X[[i]], ...)
   #>  21. │                         ├─dbplyr::escape(eval_tidy(x, mask), con = con)
   #>  22. │                         └─rlang::eval_tidy(x, mask)
   #>  23. ├─1:n()
   #>  24. └─base::.handleSimpleError(`<fn>`, "NA/NaN argument", base::quote(from:to))
   #>  25.   └─purrr (local) h(simpleError(msg, call))
   #>  26.     └─cli::cli_abort(c(i = "In index: {i}."), parent = cnd, call = error_call)
   #>  27.       └─rlang::abort(...)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org