You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/09 19:50:28 UTC

[GitHub] [spark] rednaxelafx edited a comment on pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate

rednaxelafx edited a comment on pull request #31509:
URL: https://github.com/apache/spark/pull/31509#issuecomment-775610985


   Just some random thoughts:
   
   This kind of functionality is certain useful for debugging and could come in handy for writing shorthands, but I have mixed feelings about exposing it on the SQL language extension level.
   i.e. I'd love to see this if we're designing a new language from scratch, but I'm not so sure about adding it to SQL.
   
   - In terms of the IR (intermediate representation) node, it's very common in programming languages or their compilers to feature this kind of expression type. Some example names are `BlockExpression` (e.g. in [System.Linq.Expressions.BlockExpression](https://docs.microsoft.com/en-us/dotnet/api/system.linq.expressions.blockexpression)), or C's [Comma expression](https://en.wikipedia.org/wiki/Comma_operator), or just Seq / Sequence expression etc. I haven't seen the term "delegate expression" used this way, though.
   - While this comes in really handy, it encourages two things:
     - side effects: yes you do want the side effects for debugging, but for regular SQL queries you should probably avoid side effects to have less surprises
     - duplicate notion of projection: the `DelegateExpression` being proposed here is really just a 2-level nested projection embedded on the expression level: 1 level for evaluating all the children expressions, and another level to discard all but the last expression's result. This duplication is somewhat annoying, as is the case for some of Spark's higher-ordered functions -- they duplicate some semantics from the plan level to the expression, but the optimizations were not updated to recognized them efficiently.
   - If we really want to make the most out of the opportunity of adding a new expression infrastructure, we might want to also allow this kind of "block expression" to support "local alias", or more commonly known as just "local variables" or "local bindings" in a programming language. Again, this is something that already exists in the `Project` plan level operator, but it's just clumsy to use in this kind of context.
   
   If we're talking about making some enhancements in the internals of Catalyst and adding a more flexible, more powerful way of representing projections, both on the plan level and on the expression level, that'd be super awesome and I'd help push it forward.
   Otherwise adding a one-off expression as proposed in this PR here doesn't feel very clean. At least I don't feel like the proposal as-is should be a regular built-in function, but maybe just something that's on the side for aiding debugging.
   
   Just my two cents


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org