You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/06 13:28:44 UTC

[GitHub] [arrow-datafusion] tustvold opened a new pull request, #5894: Use interleave in BatchBuilder (~10% faster merge)

tustvold opened a new pull request, #5894:
URL: https://github.com/apache/arrow-datafusion/pull/5894

   _Draft as builds on #5886_
   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Simplifies BatchBuilder by using the upstream interleave kernel.
   
   As an added bonus this yields some non-trivial performance speed ups.
   
   
   ```
   merge sorted i64        time:   [8.5328 ms 8.5418 ms 8.5511 ms]
                           change: [-10.592% -10.481% -10.361%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   sort merge i64          time:   [8.7169 ms 8.7280 ms 8.7391 ms]
                           change: [-10.569% -10.429% -10.284%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge sorted f64        time:   [8.4695 ms 8.4762 ms 8.4838 ms]
                           change: [-10.751% -10.641% -10.524%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     6 (6.00%) high mild
     1 (1.00%) high severe
   
   sort merge f64          time:   [8.6876 ms 8.6986 ms 8.7099 ms]
                           change: [-10.503% -10.350% -10.183%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   merge sorted utf8 low cardinality
                           time:   [7.2147 ms 7.2184 ms 7.2224 ms]
                           change: [-1.0837% -0.9800% -0.8862%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   sort merge utf8 low cardinality
                           time:   [7.8602 ms 7.8836 ms 7.9077 ms]
                           change: [-1.1147% -0.7057% -0.2931%] (p = 0.00 < 0.05)
                           Change within noise threshold.
   
   merge sorted utf8 high cardinality
                           time:   [9.5599 ms 9.5747 ms 9.5925 ms]
                           change: [-11.571% -11.374% -11.192%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   sort merge utf8 high cardinality
                           time:   [10.700 ms 10.753 ms 10.807 ms]
                           change: [-11.555% -10.940% -10.308%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   merge sorted utf8 tuple time:   [15.013 ms 15.029 ms 15.048 ms]
                           change: [-16.753% -16.600% -16.449%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   
   sort merge utf8 tuple   time:   [18.151 ms 18.260 ms 18.367 ms]
                           change: [-15.472% -14.825% -14.148%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge sorted utf8 dictionary
                           time:   [6.1689 ms 6.1744 ms 6.1818 ms]
                           change: [-6.2663% -6.1546% -6.0351%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   sort merge utf8 dictionary
                           time:   [6.3495 ms 6.3580 ms 6.3665 ms]
                           change: [-6.1132% -5.9187% -5.7376%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge sorted utf8 dictionary tuple
                           time:   [8.6838 ms 8.6894 ms 8.6951 ms]
                           change: [-6.4096% -6.3163% -6.2247%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   sort merge utf8 dictionary tuple
                           time:   [9.9218 ms 10.006 ms 10.090 ms]
                           change: [-6.6577% -5.5463% -4.4579%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge sorted mixed dictionary tuple
                           time:   [14.137 ms 14.150 ms 14.164 ms]
                           change: [-7.6257% -7.5182% -7.4068%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   
   sort merge mixed dictionary tuple
                           time:   [15.553 ms 15.635 ms 15.717 ms]
                           change: [-7.4586% -6.7567% -6.0622%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge sorted mixed tuple
                           time:   [14.151 ms 14.164 ms 14.180 ms]
                           change: [-19.019% -18.906% -18.786%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   sort merge mixed tuple  time:   [15.717 ms 15.762 ms 15.806 ms]
                           change: [-19.333% -18.977% -18.625%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   ```
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5894: Use interleave in BatchBuilder (~10% faster merge)

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #5894:
URL: https://github.com/apache/arrow-datafusion/pull/5894#discussion_r1160635957


##########
datafusion/core/src/physical_plan/sorts/builder.rs:
##########
@@ -89,82 +99,37 @@ impl BatchBuilder {
             return Ok(None);
         }
 
-        // Mapping from stream index to the index of the first buffer from that stream
-        let mut buffer_idx = 0;
-        let mut stream_to_buffer_idx = Vec::with_capacity(self.batches.len());
-
-        for batches in &self.batches {
-            stream_to_buffer_idx.push(buffer_idx);
-            buffer_idx += batches.len();
-        }
-
-        let columns = self
-            .schema
-            .fields()
-            .iter()
-            .enumerate()
-            .map(|(column_idx, field)| {
-                let arrays = self
+        let columns = (0..self.schema.fields.len())
+            .map(|column_idx| {
+                let arrays: Vec<_> = self
                     .batches
                     .iter()
-                    .flat_map(|batch| {
-                        batch.iter().map(|batch| batch.column(column_idx).data())
-                    })
+                    .map(|(_, batch)| batch.column(column_idx).as_ref())
                     .collect();
-
-                let mut array_data = MutableArrayData::new(
-                    arrays,
-                    field.is_nullable(),
-                    self.indices.len(),
-                );
-
-                let first = &self.indices[0];
-                let mut buffer_idx =
-                    stream_to_buffer_idx[first.stream_idx] + first.batch_idx;
-                let mut start_row_idx = first.row_idx;
-                let mut end_row_idx = start_row_idx + 1;
-
-                for row_index in self.indices.iter().skip(1) {
-                    let next_buffer_idx =
-                        stream_to_buffer_idx[row_index.stream_idx] + row_index.batch_idx;
-
-                    if next_buffer_idx == buffer_idx && row_index.row_idx == end_row_idx {
-                        // subsequent row in same batch
-                        end_row_idx += 1;
-                        continue;
-                    }
-
-                    // emit current batch of rows for current buffer
-                    array_data.extend(buffer_idx, start_row_idx, end_row_idx);
-
-                    // start new batch of rows
-                    buffer_idx = next_buffer_idx;
-                    start_row_idx = row_index.row_idx;
-                    end_row_idx = start_row_idx + 1;
-                }
-
-                // emit final batch of rows
-                array_data.extend(buffer_idx, start_row_idx, end_row_idx);
-                make_array(array_data.freeze())
+                Ok(interleave(&arrays, &self.indices)?)

Review Comment:
   👌 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] tustvold merged pull request #5894: Use interleave in BatchBuilder (~10% faster merge)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold merged PR #5894:
URL: https://github.com/apache/arrow-datafusion/pull/5894


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org