You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2023/01/10 02:06:43 UTC

[GitHub] [incubator-uniffle] advancedxy commented on issue #378: [Improvement] Optimize data flushing and memory usage for huge partitions to improve stability

advancedxy commented on issue #378:
URL: https://github.com/apache/incubator-uniffle/issues/378#issuecomment-1376624355

   > This issue will track all the optimizations of huge partition, and all sub-tasks will be connected with this. The solution of handling huge partitions is to make it flush to HDFS directly and limit memory usage, all subtasks are as follows.
   > 
   > 1. Speed up flushing partition data to HDFS.
   >    
   >    * [x]  [Support writing multi files of single partition to improve speed in HDFS storageĀ #396](https://github.com/apache/incubator-uniffle/pull/396)
   > 2. Introduce the memory usage limitation for huge partitions
   >    
   >    * [x]  [[ISSUE-378][HugePartition][Part-1] Record every partition data size for one appĀ #458](https://github.com/apache/incubator-uniffle/pull/458)
   >    * [ ]  Introduce storage selector strategy(to support huge partition flushed to HDFS directly) in MultipleStorageManager to replace fallback strategy
   >    * [ ]  Introduce more metrics to monitor huge partitions and so on
   > 3. Support split huge event into smaller multiple events concurrently to speed up flushing
   > 
   > cc @jerqi @advancedxy I will create some subtasks of issues and PRs, feel free to discuss more.
   
   Missed this comment. LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org