You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/10/11 17:08:00 UTC

[jira] [Commented] (ARROW-17974) [C++] random function can't actually be used

    [ https://issues.apache.org/jira/browse/ARROW-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615975#comment-17615975 ] 

Weston Pace commented on ARROW-17974:
-------------------------------------

Probably related are ARROW-16286 and ARROW-16290

> [C++] random function can't actually be used
> --------------------------------------------
>
>                 Key: ARROW-17974
>                 URL: https://issues.apache.org/jira/browse/ARROW-17974
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Neal Richardson
>            Priority: Major
>
> random() is currently implemented as a nullary function. It doesn't let you specify the number of values you want to generate because it's designed to generate however many the given ExecBatch has. The only option RandomOptions takes seems to be an optional seed value. Unfortunately, the result is that the function is not usable, AFAICT.
> Calling the compute function directly, you get 0 values (all examples from R): 
> {code}
> library(arrow)
> call_function("random")
> # Array
> # <double>
> # []
> {code}
> Calling it from within an ExecPlan, it errors because it is not a proper scalar function, despite what the filenames say (scalar_random.cc, etc.):
> {code}
> library(arrow)
> library(dplyr)
> mtcars %>% 
>   arrow_table() %>% 
>   mutate(x = arrow_random()) %>% 
>   collect()
> # Error in `collect()`:
> # ! Invalid: ExecuteScalarExpression cannot Execute non-scalar expression Array[double]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)