You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/09/07 12:39:53 UTC

[GitHub] [incubator-uniffle] leixm commented on pull request #190: [Improvement][AQE] Avoid calling getShuffleResult multiple times

leixm commented on PR #190:
URL: https://github.com/apache/incubator-uniffle/pull/190#issuecomment-1239337208

   ### Environment
   Shuffle Server Num : 5
   Shuffle Write: 48G
   Configuration: --conf spark.sql.shuffle.partitions=5000 --conf spark.sql.adaptive.enabled=true --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=64MB
   
   We measure the performance of get_shuffle_result by the following metrics:
   - get_shuffle_result_times: The number of calls of the get_shuffle_result interface
   - get_shuffle_result_cost: Time consumption of get_shuffle_result interface
   - get_shuffle_result_for_multi_part_times:he number of calls of the get_shuffle_result_for_multi_part interface
   - get_shuffle_result_for_multi_part_cost: Time consumption of get_shuffle_result_for_multi_part interface
   ### Test Results
   Before issue_136
   
   | serverId | get_shuffle_result_times | get_shuffle_result_cost(ms) |
   | -------- | ------------------------ | --------------------------- |
   | Server1  | 1000                     | 157614                      |
   | Server2  | 1000                     | 426897                      |
   | Server3  | 1000                     | 269488                      |
   | Server4  | 1000                     | 906758                      |
   | Server5  | 1001                     | 123217                      |
   | sum      | 5001                     | 1883974                     |
   
   
   
   After issue_136
   
   | serverId | get_shuffle_result_for_multi_part_times | get_shuffle_result_for_multi_part_cost(ms) |
   | -------- | --------------------------------------- | ------------------------------------------ |
   | Server1  | 833                                     | 870720                                     |
   | Server2  | 833                                     | 260865                                     |
   | Server3  | 834                                     | 333202                                     |
   | Server4  | 833                                     | 90277                                      |
   | Server5  | 835                                     | 94113                                      |
   | sum      | 4168                                    | 1649177                                    |
   
   ### Summarize
   The number of interface requests is reduced by 16%, and the total time is reduced by 12.5%. If we assign consecutive partitions to a server, the improvement will be more obvious.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@uniffle.apache.org
For additional commands, e-mail: issues-help@uniffle.apache.org