You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/09 10:49:10 UTC

[GitHub] [arrow-datafusion] tustvold opened a new pull request, #4151: Use interleave kernel in SortPreservingMerge

tustvold opened a new pull request, #4151:
URL: https://github.com/apache/arrow-datafusion/pull/4151

   _Creating as draft as need to run benchmarks_
   
   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Arrow now has a dedicated kernel that should lead to improved performance and less code
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on pull request #4151: Use interleave kernel in SortPreservingMerge

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #4151:
URL: https://github.com/apache/arrow-datafusion/pull/4151#issuecomment-1309748049

   Benchmarks are a bit of a mixed bag, it improves some but others are worse
   
   ```
   merge i64               time:   [11.315 ms 11.323 ms 11.332 ms]
                           change: [-4.5706% -4.4894% -4.4047%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   merge f64               time:   [11.588 ms 11.593 ms 11.598 ms]
                           change: [-4.0767% -4.0063% -3.9446%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge utf8 low cardinality
                           time:   [10.602 ms 10.610 ms 10.619 ms]
                           change: [+4.1663% +4.2692% +4.3838%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   merge utf8 high cardinality
                           time:   [12.186 ms 12.194 ms 12.203 ms]
                           change: [-4.0200% -3.8918% -3.7694%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   merge utf8 tuple        time:   [17.000 ms 17.033 ms 17.067 ms]
                           change: [-9.8709% -9.6664% -9.4597%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     7 (7.00%) high mild
     1 (1.00%) high severe
   
   merge utf8 dictionary   time:   [10.103 ms 10.111 ms 10.121 ms]
                           change: [+1.9130% +2.0123% +2.1263%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     6 (6.00%) high mild
     3 (3.00%) high severe
   
   merge utf8 dictionary tuple
                           time:   [12.729 ms 12.734 ms 12.740 ms]
                           change: [+2.8633% +2.9415% +3.0230%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   merge mixed utf8 dictionary tuple
                           time:   [16.426 ms 16.436 ms 16.446 ms]
                           change: [-5.5555% -5.4858% -5.4102%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   merge mixed tuple       time:   [16.150 ms 16.197 ms 16.252 ms]
                           change: [-13.534% -13.263% -12.960%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) high mild
     7 (7.00%) high severe
   
   [raphael@raphael-dev core]$ RUSTFLAGS="-C target-cpu=native" cargo bench --bench merge -- --baseline master
       Finished bench [optimized] target(s) in 0.16s
        Running benches/merge.rs (/data/raphael/arrow-datafusion/target/release/deps/merge-401bf7cbb5a80837)
   merge i64               time:   [11.586 ms 11.594 ms 11.602 ms]
                           change: [-2.2860% -2.2037% -2.1148%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge f64               time:   [11.797 ms 11.806 ms 11.816 ms]
                           change: [-2.3456% -2.2435% -2.1450%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   merge utf8 low cardinality
                           time:   [10.596 ms 10.602 ms 10.609 ms]
                           change: [+4.1059% +4.1981% +4.2906%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   merge utf8 high cardinality
                           time:   [11.924 ms 11.933 ms 11.942 ms]
                           change: [-6.0753% -5.9473% -5.8217%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   merge utf8 tuple        time:   [17.015 ms 17.055 ms 17.107 ms]
                           change: [-9.7828% -9.5451% -9.2870%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   merge utf8 dictionary   time:   [10.443 ms 10.596 ms 10.749 ms]
                           change: [+5.4018% +6.8962% +8.3995%] (p = 0.00 < 0.05)
                           Performance has regressed.
   
   merge utf8 dictionary tuple
                           time:   [12.656 ms 12.665 ms 12.674 ms]
                           change: [+2.2842% +2.3785% +2.4717%] (p = 0.00 < 0.05)
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   merge mixed utf8 dictionary tuple
                           time:   [16.569 ms 16.666 ms 16.771 ms]
                           change: [-4.7103% -4.1593% -3.5856%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 21 outliers among 100 measurements (21.00%)
     21 (21.00%) high severe
   
   merge mixed tuple       time:   [16.165 ms 16.181 ms 16.198 ms]
                           change: [-13.460% -13.347% -13.228%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4151: Use interleave kernel in SortPreservingMerge

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4151:
URL: https://github.com/apache/arrow-datafusion/pull/4151#issuecomment-1312458659

   > Benchmarks are a bit of a mixed bag, it improves some but others are worse
   
   
   I am surprised by this -- do you think  https://github.com/apache/arrow-rs/pull/2975 (in arrow 27) will help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold closed pull request #4151: Use interleave kernel in SortPreservingMerge

Posted by GitBox <gi...@apache.org>.
tustvold closed pull request #4151: Use interleave kernel in SortPreservingMerge
URL: https://github.com/apache/arrow-datafusion/pull/4151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org