You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/04 03:29:23 UTC

[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #2132: WIP: Reduce sort memory usage v1

yjshen commented on code in PR #2132:
URL: https://github.com/apache/arrow-datafusion/pull/2132#discussion_r841342087


##########
datafusion/core/src/physical_plan/sorts/sort.rs:
##########
@@ -105,13 +107,21 @@ impl ExternalSorter {
         }
     }
 
-    async fn insert_batch(&self, input: RecordBatch) -> Result<()> {
+    async fn insert_batch(
+        &self,
+        input: RecordBatch,
+        tracking_metrics: &MemTrackingMetrics,
+    ) -> Result<()> {
         if input.num_rows() > 0 {
             let size = batch_byte_size(&input);
             self.try_grow(size).await?;
             self.metrics.mem_used().add(size);
             let mut in_mem_batches = self.in_mem_batches.lock().await;
-            in_mem_batches.push(input);
+            // NB timer records time taken on drop, so there are no
+            // calls to `timer.done()` below.
+            let _timer = tracking_metrics.elapsed_compute().timer();
+            let partial = sort_batch(input, self.schema.clone(), &self.expr)?;

Review Comment:
   Performance would deteriorate significantly without this change:
   
   ```
   Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 1, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "/home/yijie/sort_test/tpch-parquet", file_format: "parquet", mem_table: false, output_path: None }
   Query 1 iteration 0 took 4619.9 ms and returned 6001214 rows
   Query 1 iteration 1 took 4561.0 ms and returned 6001214 rows
   Query 1 iteration 2 took 4527.7 ms and returned 6001214 rows
   ```
   The main reason I think is caused by random memory access while constructing output batches. Without this per-batch sort, while collecting cells from unsorted batches, the memory access would be fully randomized. With this per-batch sort, we are accessing memory linearly for each column in each batch, this would results in much predictable memory access pattern and benefits the CPU cache.
   
   I think the perf counter confirms the above speculation:
   ```
   sudo perf stat -a -e cache-misses,cache-references,l3_cache_accesses,l3_misses,dTLB-load-misses,dTLB-loads target/release/tpch benchmark datafusion --iterations 3 --path /home/yijie/sort_test/tpch-parquet --format parquet --query 1 --batch-size 4096
   ```
   
   Without this per-batch sort:
   ``` 
   Performance counter stats for 'system wide':
   
        1,340,359,889      cache-misses              #   35.817 % of all cache refs  
        3,742,289,458      cache-references                                          
        1,984,089,839      l3_cache_accesses                                         
          540,429,658      l3_misses                                                 
          303,508,234      dTLB-load-misses          #   49.51% of all dTLB cache accesses
          613,048,439      dTLB-loads                                                
   
         14.222309739 seconds time elapsed
   ```
   
   With this per-batch sort:
   
   ```
    Performance counter stats for 'system wide':
   
        1,059,913,512      cache-misses              #   30.715 % of all cache refs  
        3,450,839,405      cache-references                                          
        1,388,975,765      l3_cache_accesses                                         
          235,570,805      l3_misses                                                 
          239,390,511      dTLB-load-misses          #   51.36% of all dTLB cache accesses
          466,141,655      dTLB-loads                                                
   
          8.675278258 seconds time elapsed
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org