You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/27 15:25:28 UTC

[GitHub] [spark] Kimahriman opened a new pull request #34727: [SPARK-37467][SQL] Consolidate whole stage and non whole stage subexpression elimination

Kimahriman opened a new pull request #34727:
URL: https://github.com/apache/spark/pull/34727

### What changes were proposed in this pull request?

This PR consolidates the code paths for subexpression elimination in whole stage and non-whole stage codegen. Whole stage codegen seemed to be mostly a superset of the non-whole stage subexpression elimination, just with whole stage not using the codegen context to track subexpressions. Since subexpression values are replaced with empty blocks when evaluated, the context should be able to track the subexpressions across multiple operators. Not sure if there's corner cases I'm missing though.

It shouldn't result in any functionality changes, but there are slight differences in the generated code as a result of this:
- Subexpressions in whole stage always use mutable state for results instead of inlining results to support code splitting in non-whole stage
- Non-whole stage now supports the same inlining subexpressions if small enough as whole stage codegen
- Subexpressions are tracked across multiple physical operators in whole stage. They are still only calculated in each operator, but if you happen to have an expression in a later operator that was a subexpression in a previous operator, it will be used in the later operator.

### Why are the changes needed?

Currently, there are different code paths to handle subexpression elimination in whole stage and non-whole stage codegen. This makes it harder to add new capabilities to subexpression elimination having to deal with independent code paths.

### Does this PR introduce _any_ user-facing change?

No, just slight changes in generated code.

### How was this patch tested?

Existing unit tests.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Kimahriman edited a comment on pull request #34727: [SPARK-37467][SQL] Consolidate whole stage and non whole stage subexpression elimination

Posted by GitBox <gi...@apache.org>.

Kimahriman edited a comment on pull request #34727:
URL: https://github.com/apache/spark/pull/34727#issuecomment-980643219


   @viirya. I've been playing around with this and I haven't thought of any breaking cases, but curious if there's anything you can think of or problems with this approach. Mostly trying to consolidate things before playing around with subexpression elimination inside of lambda functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #34727: [SPARK-37467][SQL] Consolidate whole stage and non whole stage subexpression elimination

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #34727:
URL: https://github.com/apache/spark/pull/34727#issuecomment-980643269


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] Kimahriman commented on pull request #34727: [SPARK-37467][SQL] Consolidate whole stage and non whole stage subexpression elimination

Posted by GitBox <gi...@apache.org>.

Kimahriman commented on pull request #34727:
URL: https://github.com/apache/spark/pull/34727#issuecomment-980643219


   @viirya. I've been playing around with this and I haven't thought of any breaking cases, but curious if there's anything you can think of or problems with this approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org