You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "askoa (via GitHub)" <gi...@apache.org> on 2023/02/12 21:51:32 UTC

[GitHub] [arrow-rs] askoa opened a new pull request, #3705: perf: `take_run` improvements

askoa opened a new pull request, #3705:
URL: https://github.com/apache/arrow-rs/pull/3705

   # Which issue does this PR close?
   
   Closes #3701 
   Part of #3520 
   
   # Rationale for this change
   See issue description #3701.
   
   # What changes are included in this PR?
   
   Update the `take_run` function to do the below
    1 Take physical_indices for the given logical indices.
    2 Run encode the physical_indices while keeping track of physical indices that needs to be taken.
    3 take values from run_array.values based on physical indices from the step 2.
    4 Build a new run array using run_ends from step 2 and values from step 3.
   
   The performance of benchmark `primitive_take_run` has improved by approx 15%
   
   <details><summary>Benchmark result</summary>
   
   ```
   Primitive_run_take/(run_array_len:512, physical_array_len:64, take_len:512)
                           time:   [14.263 µs 14.289 µs 14.314 µs]
                           change: [-16.324% -15.022% -13.604%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) high mild
     4 (4.00%) high severe
   primitive_run_take/(run_array_len:512, physical_array_len:128, take_len:512)
                           time:   [14.299 µs 14.320 µs 14.344 µs]
                           change: [-17.510% -16.083% -14.660%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     2 (2.00%) high mild
     4 (4.00%) high severe
   primitive_run_take/(run_array_len:1024, physical_array_len:256, take_len:512)
                           time:   [14.672 µs 14.689 µs 14.706 µs]
                           change: [-17.290% -15.772% -14.314%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   primitive_run_take/(run_array_len:1024, physical_array_len:256, take_len:1024)
                           time:   [29.943 µs 29.972 µs 30.002 µs]
                           change: [-18.529% -17.130% -15.819%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low mild
     7 (7.00%) high severe
   primitive_run_take/(run_array_len:2048, physical_array_len:512, take_len:512)
                           time:   [15.864 µs 15.896 µs 15.928 µs]
                           change: [-15.176% -13.517% -11.892%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   primitive_run_take/(run_array_len:2048, physical_array_len:512, take_len:1024)
                           time:   [34.206 µs 34.275 µs 34.347 µs]
                           change: [-15.092% -13.950% -12.743%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   primitive_run_take/(run_array_len:4096, physical_array_len:1024, take_len:512)
                           time:   [18.066 µs 18.094 µs 18.121 µs]
                           change: [-14.902% -13.842% -12.604%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   primitive_run_take/(run_array_len:4096, physical_array_len:1024, take_len:1024)
                           time:   [37.529 µs 37.571 µs 37.612 µs]
                           change: [-17.016% -15.749% -14.471%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) low mild
     6 (6.00%) high severe
   
   ```
   
   </details>
   
   # Are there any user-facing changes?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] askoa commented on a diff in pull request #3705: perf: `take_run` improvements

Posted by "askoa (via GitHub)" <gi...@apache.org>.

askoa commented on code in PR #3705:
URL: https://github.com/apache/arrow-rs/pull/3705#discussion_r1104597119


##########
arrow-select/src/take.rs:
##########
@@ -2166,17 +2174,17 @@ mod tests {
         let run_array = builder.finish();
 
         let take_indices: PrimitiveArray<Int32Type> =
-            vec![2, 7, 10].into_iter().collect();
+            vec![7, 2, 3, 7, 10].into_iter().collect();
 
         let take_out = take_run(&run_array, &take_indices).unwrap();
 
-        assert_eq!(take_out.len(), 3);
+        assert_eq!(take_out.len(), 5);
 
-        assert_eq!(take_out.run_ends().len(), 1);
-        assert_eq!(take_out.run_ends().value(0), 3);
+        assert_eq!(take_out.run_ends().len(), 4);
+        assert_eq!(take_out.run_ends().values(), &[1_i32, 3, 4, 5]);
 
         let take_out_values = as_primitive_array::<Int32Type>(take_out.values());
-        assert_eq!(take_out_values.value(0), 2);
+        assert_eq!(take_out_values.values(), &[2, 2, 2, 2]);

Review Comment:
   Updated the test case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold merged pull request #3705: perf: `take_run` improvements

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold merged PR #3705:
URL: https://github.com/apache/arrow-rs/pull/3705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] ursabot commented on pull request #3705: perf: `take_run` improvements

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #3705:
URL: https://github.com/apache/arrow-rs/pull/3705#issuecomment-1428200903

   Benchmark runs are scheduled for baseline = e37e379f158c644fd3bed63dfc9acc23b49aaf4d and contender = d011e6adc6c7ff0ae72784e89cf736112016c9c5. d011e6adc6c7ff0ae72784e89cf736112016c9c5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d7b6b295b9a049bbb535193881e670e2...64eec01adb334d458ae3b99308e2a81c/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a7dcdf9b716c478fa96834d789972d9d...3dfc8018a8924c248e793cdcc19a315a/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/7e4d2db2e6614243b01a8fb4348505fe...d26cf514779545049c475e15993f8d52/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/267d96e42152405ea811bb3441853a2b...0e6442473f294c00ac6b68202dad39d0/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3705: perf: `take_run` improvements

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3705:
URL: https://github.com/apache/arrow-rs/pull/3705#discussion_r1104510024


##########
arrow-select/src/take.rs:
##########
@@ -2166,17 +2174,17 @@ mod tests {
         let run_array = builder.finish();
 
         let take_indices: PrimitiveArray<Int32Type> =
-            vec![2, 7, 10].into_iter().collect();
+            vec![7, 2, 3, 7, 10].into_iter().collect();
 
         let take_out = take_run(&run_array, &take_indices).unwrap();
 
-        assert_eq!(take_out.len(), 3);
+        assert_eq!(take_out.len(), 5);
 
-        assert_eq!(take_out.run_ends().len(), 1);
-        assert_eq!(take_out.run_ends().value(0), 3);
+        assert_eq!(take_out.run_ends().len(), 4);
+        assert_eq!(take_out.run_ends().values(), &[1_i32, 3, 4, 5]);
 
         let take_out_values = as_primitive_array::<Int32Type>(take_out.values());
-        assert_eq!(take_out_values.value(0), 2);
+        assert_eq!(take_out_values.values(), &[2, 2, 2, 2]);

Review Comment:
   Just an idea but it might be worth testing that non-consecutive logical indices that correspond to the same physical index are correctly combined into a single output run.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3705: perf: `take_run` improvements

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #3705:
URL: https://github.com/apache/arrow-rs/pull/3705#issuecomment-1427990966

   CI failure is unrelated and should be fixed by merging master into this branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] askoa commented on pull request #3705: perf: `take_run` improvements

Posted by "askoa (via GitHub)" <gi...@apache.org>.

askoa commented on PR #3705:
URL: https://github.com/apache/arrow-rs/pull/3705#issuecomment-1428084470

   > I believe this also adds support for taking from arbitrary RunEndEncoded arrays?
   
   Yes, any data type supported by take will work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org