You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/04 06:42:30 UTC

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1921: Add timeout to can_grow_directly when waiting for the Condvar.

yjshen commented on a change in pull request #1921:
URL: https://github.com/apache/arrow-datafusion/pull/1921#discussion_r819304060



##########
File path: datafusion/src/execution/memory_manager.rs
##########
@@ -340,7 +341,13 @@ impl MemoryManager {
             } else if current < min_per_rqt {
                 // if we cannot acquire at lease 1/2n memory, just wait for others
                 // to spill instead spill self frequently with limited total mem
-                self.cv.wait(&mut rqt_current_used);
+                let timeout = self
+                    .cv
+                    .wait_for(&mut rqt_current_used, Duration::from_secs(5));

Review comment:
       Is this wait causing infinite wait in your query? If so, is it a possible leak somewhere else that exists, and we should fix it?
   
   The primary reason it's waiting forever here for a notify signal is to avoid repeated self-spilling with little memory share, producing many spill files and degrading performance. 
   
   If a consumer cannot get at least 1/2n of memory among the total, perhaps blocking the thread and yielding computational resources to others is better? Then the huge consumers can progress faster. Either they trigger self-spilling or finish their jobs. 
   
   Working with a minimum memory is not ideal because it will harm the overall query processing throughput. Even if the repeated spilling consumer gets enough memory later, it will need a lot of effort reading spills and merging partial results. 
   
   We are working with an assumption of a batch engine currently. May one day, we can have a customized task scheduler, self-tuning based on CPU memory usages, and even with an adaptive partition size tuning capability.
   
   Nevertheless, if we must add a timeout, please also print a warning log, as this could be a symptom of a potential bug. And yes, I think to make it a configurable wait is necessary. And please choose a longer timeout as default if possible.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org