You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/10 06:30:23 UTC

[GitHub] [spark] wangyum opened a new pull request, #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

wangyum opened a new pull request, #36829:
URL: https://github.com/apache/spark/pull/36829

   ### What changes were proposed in this pull request?
   
   This pr add a threshold to not in line CTE if the CTE is referenced at least this threshold and `spark.sql.exchange.reuse` is enabled.
   
   ### Why are the changes needed?
   
   1. In some cases, the CTE can be heavy and referenced many times. Not in line CTE can reduce a lot of duplicate work. But this is not always true. Such as TPC-DS q4.
   2. It can reduce the number of table reads and reduce the pressure on Namenode. Especially for tables with many small files. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36829:
URL: https://github.com/apache/spark/pull/36829#issuecomment-1153343979

   cc @peter-toth @allisonwang-db @maryannxue FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #36829:
URL: https://github.com/apache/spark/pull/36829#issuecomment-1257322956

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE
URL: https://github.com/apache/spark/pull/36829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maryannxue commented on pull request #36829: [SPARK-39438][SQL] Add a threshold to not in line CTE

Posted by GitBox <gi...@apache.org>.
maryannxue commented on PR #36829:
URL: https://github.com/apache/spark/pull/36829#issuecomment-1159037835

   There will definitely be regressions too! It's a tradeoff between double scanning and double shuffling. CTEs not inlined introduce extra shuffles. And it depends on how efficient the scan is implemented.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org