You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "AngersZhuuuu (via GitHub)" <gi...@apache.org> on 2024/03/06 04:11:42 UTC

[PR] [SPARK-47294][SQL] OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec) [spark]

AngersZhuuuu opened a new pull request, #45398:
URL: https://github.com/apache/spark/pull/45398

   ### What changes were proposed in this pull request?
   Current OptimizeSkewInRebalanceRepartitions only support match case ShuffleQueryStageExec
   ```
       plan transformUp {
         case stage: ShuffleQueryStageExec if isSupported(stage.shuffle) =>
           tryOptimizeSkewedPartitions(stage)
       }
   ```
   
   This only can work when we write a query with rebalance hint
   ```
   SELECT /*+ REBALANCE(col) */ * FROM table
   ```
   
   This won't work when we are writing to a table 
   ```
   INSERT INTO t1
   SELECT  /*+ REBALANCE(key1) */
   *
   FROM skewData1
   ```
   This pr support this.
   
   ### Why are the changes needed?
   Support more case since this can avoid skew when we are writing to a table with `REBALANCE` hint
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   MT
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Before 
   ![image](https://github.com/apache/spark/assets/46485123/cde10d04-8cc0-427a-bc83-c4c51929e0bc)
   
   After
   ![image](https://github.com/apache/spark/assets/46485123/8d1c8a18-77f2-48a2-b388-b9939dd644e0)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47294][SQL] OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec) [spark]

Posted by "ulysses-you (via GitHub)" <gi...@apache.org>.
ulysses-you commented on PR #45398:
URL: https://github.com/apache/spark/pull/45398#issuecomment-1980095300

   @AngersZhuuuu  I guess you are changing a outdate codebase... This feature has been supported at https://github.com/apache/spark/pull/34542 (Spark 3.3)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47294][SQL] OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec) [spark]

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #45398:
URL: https://github.com/apache/spark/pull/45398#issuecomment-1980056417

   ping @ulysses-you @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47294][SQL] OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec) [spark]

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu closed pull request #45398: [SPARK-47294][SQL] OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec)
URL: https://github.com/apache/spark/pull/45398


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47294][SQL] OptimizeSkewInRebalanceRepartitions should support ProjectExec(_,ShuffleQueryStageExec) [spark]

Posted by "AngersZhuuuu (via GitHub)" <gi...@apache.org>.
AngersZhuuuu commented on PR #45398:
URL: https://github.com/apache/spark/pull/45398#issuecomment-1980144655

   > @AngersZhuuuu I guess you are changing a outdate codebase... This feature has been supported at #34542 (Spark 3.3)
   
   Yea...didn't see the change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org