You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/10/10 13:48:00 UTC

[jira] [Created] (ARROW-17974) [C++] random function can't actually be used

Neal Richardson created ARROW-17974:
---------------------------------------

             Summary: [C++] random function can't actually be used
                 Key: ARROW-17974
                 URL: https://issues.apache.org/jira/browse/ARROW-17974
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Neal Richardson


random() is currently implemented as a nullary function. It doesn't let you specify the number of values you want to generate because it's designed to generate however many the given ExecBatch has. The only option RandomOptions takes seems to be an optional seed value. Unfortunately, the result is that the function is not usable, AFAICT.

Calling the compute function directly, you get 0 values (all examples from R): 

{code}
library(arrow)
call_function("random")
# Array
# <double>
# []
{code}

Calling it from within an ExecPlan, it errors because it is not a proper scalar function, despite what the filenames say (scalar_random.cc, etc.):

{code}
library(arrow)
library(dplyr)

mtcars %>% 
  arrow_table() %>% 
  mutate(x = arrow_random()) %>% 
  collect()
# Error in `collect()`:
# ! Invalid: ExecuteScalarExpression cannot Execute non-scalar expression Array[double]
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)