You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/02/27 16:54:00 UTC

[GitHub] [arrow] westonpace commented on issue #34365: [C++] Substrait cast expression fails on input types other than field reference

westonpace commented on issue #34365:
URL: https://github.com/apache/arrow/issues/34365#issuecomment-1446685564

   For most nodes, whether an expression input can be "anything" or "only a direct reference" has been described to me as a "logical" vs. "physical" thing.  You can always convert to a plan where the input to a cast is a direct reference by introducing a project node.  In other words:
   
   ```
   project:
     exprs:
       - add(2, cast(tolower("x"), int32()))
     names:
       - "out"
     input:
       src
   ```
   
   can become
   
   ```
   project:
     exprs:
       - add(2, cast("x_lower", int32()))
     names:
       - "out"
     input:
       project:
         exprs:
           - tolower("x")
         names:
           - "x_lower"
         input:
           src
   ```
   
   There are other cases where Substrait's existing nodes are "too logical" for Substrait and we are a little restrictive.  For example, in the join node and the aggregate node.  We call out this caveat here: https://arrow.apache.org/docs/dev/cpp/streaming_execution.html#expressions-general
   
   That being said, `cast` is just a function call in Acero and we do have the ability for functions to take other functions as input.  So it does seem like this is one place where we don't have to be quite so restrictive.
   
   In the linked issue you mentioned:
   
   > The Acero cast function looks like it can either take an Array or an individual object of class Datum (so a scalar, array, etc).
   
   This is true for the C++ cast function.  However, this is not true for "expressions".  In other words, a `compute::call` is constructed as follows:
   
   ```
   ARROW_EXPORT
   Expression call(std::string function, std::vector<Expression> arguments,
                   std::shared_ptr<FunctionOptions> options = NULLPTR);
   ```
   
   So it can receive any `compute::Expression` as an argument.  This discrepancy is handled during "expression execution" (`compute::ExecuteScalarExpression`).  In expression evaluation we travel the AST and convert each of the arguments into an array by executing the sub-expressions.  Finally, these input arrays are passed to the actual function call.
   
   So...I'm not sure why this isn't working.  What is the error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org