You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/25 05:16:53 UTC

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8032: ARROW-9836: [Rust][DataFusion] Improve API for usage of UDFs

jorgecarleitao edited a comment on pull request #8032:
URL: https://github.com/apache/arrow/pull/8032#issuecomment-679631810


   > For built-in functions like `sqrt` I would expect DataFusion to provide convenience functions to create an expression, like we do with `col` and the aggregate functions. I assume we could also do that with the design proposed here?
   > 
   > For example, I would like to be able to write:
   > 
   > ```rust
   > df.select(vec![col("foo"), sqrt(col("bar"))])?
   > ```
   
   This PR does not support this, as it threats every function (built-in or not) equally. To include that case, IMO this PR needs to add a new enum in the logical `Expr`:
   
   * `Expr::ScalarFunction { name: String/Enum, args: Vec<Expr> }` that we logically know its return type based on `name` (e.g. `"sqrt"`), exactly like `Expr::BinaryExpr`. This is mapped to a physical expression during planning. These can be build without access to the registry, as we hard-cod the return type on the logical plan to be consistent with the physical one, like we do for our aggregates, binary expressions, etc.
   * `Expr:ScalarUDF { fun: ScalarFunction, args: Vec<Expr> }`, whose return type is only known after going to the registry to check what the user set its return type to be (as this PR currently does).
   
   `sqrt` would return `Expr::ScalarFunction`, that knows its own return type, and the planner converts it to `ScalarFunction` via a hard-coded map, while `Expr:UDF`'s physical planning is just planning `args` and pass them to `ScalarFunction` like this PR already does.
   
   I.e. at the physical level, built-in and UDFs are indistinguishable, but at the logical plan, one only knows its name (built-in), the other also knows its physical representation `ScalarUDF`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org