You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/06/20 16:30:10 UTC

[GitHub] [arrow] benibus opened a new pull request, #36179: GH-36176: [C++] Fix regression for single-key Table sorting

benibus opened a new pull request, #36179:
URL: https://github.com/apache/arrow/pull/36179

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   Fixes a regression introduced in https://github.com/apache/arrow/pull/35727.
   
   ### What changes are included in this PR?
   
   Re-implements a branch in the `Table` sorter that defers to the `ChunkedArray` sorter for single sort keys.
   
   ### Are these changes tested?
   
   Covered by existing tests.
   
   ### Are there any user-facing changes?
   
   No.
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599137639

   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1600470640

   Conbench analyzed the 6 benchmark runs on commit `8a4c19bd`.
   
   There were 7 benchmark results indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-06-21 07:09:25Z](http://conbench.ursa.dev/compare/runs/0d91536465bf4fa98a13e118f2410cdf...b00a3de70a9b4dff9a61819dd85e626e/)
     - [params=1048576/0, source=cpp-micro, suite=arrow-acero-aggregate-benchmark](http://conbench.ursa.dev/compare/benchmarks/064925e0be1d78f58000f335dd1a90e4...06492a24fb887b93800099f8bc7b5bfc)
     - [source=java-micro, suite=arrow.memory.util.ByteFunctionHelpersBenchmarks](http://conbench.ursa.dev/benchmark-results/06492b13fa487a948000d98361a30030)
   - and 5 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14431431375) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36179:
URL: https://github.com/apache/arrow/pull/36179


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599260606

   Thanks for the quick fix @benibus !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] benibus commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting

Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599134933

   Local benchmark results (approx. best-case)
   
   Before
   ```
   -------------------------------------------------------------------------------------------------------
   Benchmark                                             Time             CPU   Iterations UserCounters...
   -------------------------------------------------------------------------------------------------------
   TableSortIndicesInt64Narrow/1048576/100/1/32  367381664 ns    367357156 ns            2 chunks=32 columns=1 items_per_second=2.85438M/s null_percent=1
   TableSortIndicesInt64Narrow/1048576/4/1/32    281608431 ns    281605789 ns            2 chunks=32 columns=1 items_per_second=3.72356M/s null_percent=25
   TableSortIndicesInt64Narrow/1048576/0/1/32    354414823 ns    354413272 ns            2 chunks=32 columns=1 items_per_second=2.95863M/s null_percent=0
   TableSortIndicesInt64Narrow/1048576/100/1/4   315257284 ns    315253535 ns            2 chunks=4 columns=1 items_per_second=3.32614M/s null_percent=1
   TableSortIndicesInt64Narrow/1048576/4/1/4     240298213 ns    240291188 ns            3 chunks=4 columns=1 items_per_second=4.36377M/s null_percent=25
   TableSortIndicesInt64Narrow/1048576/0/1/4     311795614 ns    311790481 ns            2 chunks=4 columns=1 items_per_second=3.36308M/s null_percent=0
   TableSortIndicesInt64Narrow/1048576/100/1/1   282068938 ns    282067950 ns            2 chunks=1 columns=1 items_per_second=3.71746M/s null_percent=1
   TableSortIndicesInt64Narrow/1048576/4/1/1     210222675 ns    210216608 ns            3 chunks=1 columns=1 items_per_second=4.98807M/s null_percent=25
   TableSortIndicesInt64Narrow/1048576/0/1/1     281105329 ns    281101193 ns            2 chunks=1 columns=1 items_per_second=3.73024M/s null_percent=0
   ```
   After
   ```
   -------------------------------------------------------------------------------------------------------
   Benchmark                                             Time             CPU   Iterations UserCounters...
   -------------------------------------------------------------------------------------------------------
   TableSortIndicesInt64Narrow/1048576/100/1/32  148802109 ns    148797635 ns            4 chunks=32 columns=1 items_per_second=7.04699M/s null_percent=1
   TableSortIndicesInt64Narrow/1048576/4/1/32    125102993 ns    125100621 ns            6 chunks=32 columns=1 items_per_second=8.38186M/s null_percent=25
   TableSortIndicesInt64Narrow/1048576/0/1/32    147291350 ns    147287710 ns            5 chunks=32 columns=1 items_per_second=7.11924M/s null_percent=0
   TableSortIndicesInt64Narrow/1048576/100/1/4    72243393 ns     72241902 ns           10 chunks=4 columns=1 items_per_second=14.5148M/s null_percent=1
   TableSortIndicesInt64Narrow/1048576/4/1/4      66780395 ns     66779462 ns           11 chunks=4 columns=1 items_per_second=15.7021M/s null_percent=25
   TableSortIndicesInt64Narrow/1048576/0/1/4      70065128 ns     70063740 ns           10 chunks=4 columns=1 items_per_second=14.966M/s null_percent=0
   TableSortIndicesInt64Narrow/1048576/100/1/1    19336900 ns     19335762 ns           36 chunks=1 columns=1 items_per_second=54.2299M/s null_percent=1
   TableSortIndicesInt64Narrow/1048576/4/1/1      25458573 ns     25457697 ns           27 chunks=1 columns=1 items_per_second=41.189M/s null_percent=25
   TableSortIndicesInt64Narrow/1048576/0/1/1      17580368 ns     17577897 ns           40 chunks=1 columns=1 items_per_second=59.6531M/s null_percent=0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599176618

   This fixes the regression here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org