You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/02 16:13:36 UTC

[GitHub] [arrow-datafusion] richox opened a new issue, #2829: Memory manager triggers unnecessary spills

richox opened a new issue, #2829:
URL: https://github.com/apache/arrow-datafusion/issues/2829

   **Describe the bug**
   
   currently, each requester can allocate at most `(pool_size - tracker_total) / num_requesters` memory. if an execution plan is skewed there will be a lot of memory wasted and unexpected spills.
   
   for example, when we perform sort -> merge joining on two inputs, and one of them is very small:
   ```
   SortMergeJoin {
       left: Sort { // bigger input
       },
       right: Sort { // smaller input
       },
       ...
   }
   ```
   
   the bigger sorter can use only half of the total memory, which will severely triggers more spills and slows down the performance.
   
   ```
    
   **To Reproduce**
   
   in the following case, we have 3 requesters requesting 90 of 100 memory. as the memory is not used up, there would be no spills during the execution. but actually the big requester spills every times.
   ```
       #[tokio::test]
       async fn multiple_skewed_requesters() {
           let config = RuntimeConfig::new()
               .with_memory_manager(MemoryManagerConfig::try_new_limit(100, 1.0).unwrap());
           let runtime = Arc::new(RuntimeEnv::new(config).unwrap());
   
           let big_requester1 = DummyRequester::new(0, runtime.clone());
           runtime.register_requester(&big_requester1.id);
           let small_requester2 = DummyRequester::new(0, runtime.clone());
           runtime.register_requester(&small_requester2.id);
           let small_requester3 = DummyRequester::new(0, runtime.clone());
           runtime.register_requester(&small_requester3.id);
   
           small_requester2.do_with_mem(5).await.unwrap();
           small_requester3.do_with_mem(5).await.unwrap();
   
           big_requester1.do_with_mem(40).await.unwrap();
           big_requester1.do_with_mem(40).await.unwrap();
           assert_eq!(big_requester1.get_spills(), 0); // assertion fails, actually = 2
       }
   ```
   **Expected behavior**
   
   spill only when all memory is used up. the spill strategy can be optimized, for example we can find out the maximum requester and let it spill.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #2829: Memory manager triggers unnecessary spills

Posted by GitBox <gi...@apache.org>.

alamb closed issue #2829: Memory manager triggers unnecessary spills
URL: https://github.com/apache/arrow-datafusion/issues/2829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org