You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2023/01/06 10:30:52 UTC

[GitHub] [incubator-uniffle] zuston commented on issue #378: [Improvement] Introduce partition size based strategy to flush single huge partition data to HDFS

zuston commented on issue #378:
URL: https://github.com/apache/incubator-uniffle/issues/378#issuecomment-1373447729

   This issue will track all the optimizations of huge partition, and all sub-tasks will be connected with this. The solution of handling huge partitions is to make it flush to HDFS directly and limit memory usage, all subtasks are as follows.
   
   1. Speed up flushing partition data to HDFS. 
       - [x] https://github.com/apache/incubator-uniffle/pull/396
   2. Introduce the memory usage limitation for huge partitions
       - [ ] Record every partition data size in `ShuffleTaskInfo`
       - [ ] Introduce storage selector strategy(to support huge partition flushed to HDFS directly) in MultipleStorageManager to replace fallback strategy
       - [ ] Introduce more metrics to monitor huge partitions and so on
   3. Support split huge event into smaller multiple events concurrently to speed up flushing
   
   
   cc @jerqi @advancedxy I will create some subtasks of issues and PRs, feel free to discuss more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org