You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by "zjf2012 (via GitHub)" <gi...@apache.org> on 2023/02/16 09:06:40 UTC

[GitHub] [incubator-uniffle] zjf2012 opened a new issue, #615: [Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle

zjf2012 opened a new issue, #615:
URL: https://github.com/apache/incubator-uniffle/issues/615

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What would you like to be improved?
   
   Both map and reduce tasks reference RssShuffleHandle wrapping 'partitionToServers' which is usually relatively far bigger than original task binary. E.g., we have  a shuffle with 10,000 partitions. The 'patitionToServers' could easily reach to 250,000 bytes assuming each map entry has size of 25 bytes.
   
   Large task binary causes long task delay and task serialization time.  We can replace it with something else like a mapping function to map partitions to shuffle servers.
   
   
   ### How should we improve?
   
   Instead, we can replace 'partitionToServers' with something else like a mapping function which map parition ID to shuffle servers. We only get shuffle servers once from the first shuffle task and cache them for later shuffle tasks with same shuffle ID per executor. 
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] xianjingfeng closed issue #615: [Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle

Posted by "xianjingfeng (via GitHub)" <gi...@apache.org>.
xianjingfeng closed issue #615: [Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle
URL: https://github.com/apache/incubator-uniffle/issues/615


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zjf2012 commented on issue #615: [Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle

Posted by "zjf2012 (via GitHub)" <gi...@apache.org>.
zjf2012 commented on issue #615:
URL: https://github.com/apache/incubator-uniffle/issues/615#issuecomment-1438148870

   @advancedxy  _[PR](https://github.com/apache/incubator-uniffle/pull/637)_ is ready. please help reivew.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] zjf2012 commented on issue #615: [Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle

Posted by "zjf2012 (via GitHub)" <gi...@apache.org>.
zjf2012 commented on issue #615:
URL: https://github.com/apache/incubator-uniffle/issues/615#issuecomment-1434133359

   As tested, binary size of vanilla spark task is about 4KB. whilst Rss shuffle task is more than 670KB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-uniffle] advancedxy commented on issue #615: [Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle

Posted by "advancedxy (via GitHub)" <gi...@apache.org>.
advancedxy commented on issue #615:
URL: https://github.com/apache/incubator-uniffle/issues/615#issuecomment-1434497956

   :+1:, thanks, this is good for optimization. And I believe once the stage recompute is finished, we will have a communication layer between executor and driver, that would be much easier to reduce task binary size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org