You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "benibus (via GitHub)" <gi...@apache.org> on 2023/06/20 16:30:10 UTC
[GitHub] [arrow] benibus opened a new pull request, #36179: GH-36176: [C++] Fix regression for single-key Table sorting
benibus opened a new pull request, #36179:
URL: https://github.com/apache/arrow/pull/36179
<!--
Thanks for opening a pull request!
If this is your first pull request you can find detailed information on how
to contribute here:
* [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
* [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
Then could you also rename the pull request title in the following format?
GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
or
MINOR: [${COMPONENT}] ${SUMMARY}
In the case of PARQUET issues on JIRA the title also supports:
PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
-->
### Rationale for this change
Fixes a regression introduced in https://github.com/apache/arrow/pull/35727.
### What changes are included in this PR?
Re-implements a branch in the `Table` sorter that defers to the `ChunkedArray` sorter for single sort keys.
### Are these changes tested?
Covered by existing tests.
### Are there any user-facing changes?
No.
<!--
If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
-->
<!-- **This PR includes breaking changes to public APIs.** -->
<!--
Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
-->
<!-- **This PR contains a "Critical Fix".** -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599137639
cc @pitrou
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting
Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1600470640
Conbench analyzed the 6 benchmark runs on commit `8a4c19bd`.
There were 7 benchmark results indicating a performance regression:
- Commit Run on `ursa-thinkcentre-m75q` at [2023-06-21 07:09:25Z](http://conbench.ursa.dev/compare/runs/0d91536465bf4fa98a13e118f2410cdf...b00a3de70a9b4dff9a61819dd85e626e/)
- [params=1048576/0, source=cpp-micro, suite=arrow-acero-aggregate-benchmark](http://conbench.ursa.dev/compare/benchmarks/064925e0be1d78f58000f335dd1a90e4...06492a24fb887b93800099f8bc7b5bfc)
- [source=java-micro, suite=arrow.memory.util.ByteFunctionHelpersBenchmarks](http://conbench.ursa.dev/benchmark-results/06492b13fa487a948000d98361a30030)
- and 5 more (see the report linked below)
The [full Conbench report](https://github.com/apache/arrow/runs/14431431375) has more details.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou merged pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36179:
URL: https://github.com/apache/arrow/pull/36179
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599260606
Thanks for the quick fix @benibus !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] benibus commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting
Posted by "benibus (via GitHub)" <gi...@apache.org>.
benibus commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599134933
Local benchmark results (approx. best-case)
Before
```
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
TableSortIndicesInt64Narrow/1048576/100/1/32 367381664 ns 367357156 ns 2 chunks=32 columns=1 items_per_second=2.85438M/s null_percent=1
TableSortIndicesInt64Narrow/1048576/4/1/32 281608431 ns 281605789 ns 2 chunks=32 columns=1 items_per_second=3.72356M/s null_percent=25
TableSortIndicesInt64Narrow/1048576/0/1/32 354414823 ns 354413272 ns 2 chunks=32 columns=1 items_per_second=2.95863M/s null_percent=0
TableSortIndicesInt64Narrow/1048576/100/1/4 315257284 ns 315253535 ns 2 chunks=4 columns=1 items_per_second=3.32614M/s null_percent=1
TableSortIndicesInt64Narrow/1048576/4/1/4 240298213 ns 240291188 ns 3 chunks=4 columns=1 items_per_second=4.36377M/s null_percent=25
TableSortIndicesInt64Narrow/1048576/0/1/4 311795614 ns 311790481 ns 2 chunks=4 columns=1 items_per_second=3.36308M/s null_percent=0
TableSortIndicesInt64Narrow/1048576/100/1/1 282068938 ns 282067950 ns 2 chunks=1 columns=1 items_per_second=3.71746M/s null_percent=1
TableSortIndicesInt64Narrow/1048576/4/1/1 210222675 ns 210216608 ns 3 chunks=1 columns=1 items_per_second=4.98807M/s null_percent=25
TableSortIndicesInt64Narrow/1048576/0/1/1 281105329 ns 281101193 ns 2 chunks=1 columns=1 items_per_second=3.73024M/s null_percent=0
```
After
```
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
TableSortIndicesInt64Narrow/1048576/100/1/32 148802109 ns 148797635 ns 4 chunks=32 columns=1 items_per_second=7.04699M/s null_percent=1
TableSortIndicesInt64Narrow/1048576/4/1/32 125102993 ns 125100621 ns 6 chunks=32 columns=1 items_per_second=8.38186M/s null_percent=25
TableSortIndicesInt64Narrow/1048576/0/1/32 147291350 ns 147287710 ns 5 chunks=32 columns=1 items_per_second=7.11924M/s null_percent=0
TableSortIndicesInt64Narrow/1048576/100/1/4 72243393 ns 72241902 ns 10 chunks=4 columns=1 items_per_second=14.5148M/s null_percent=1
TableSortIndicesInt64Narrow/1048576/4/1/4 66780395 ns 66779462 ns 11 chunks=4 columns=1 items_per_second=15.7021M/s null_percent=25
TableSortIndicesInt64Narrow/1048576/0/1/4 70065128 ns 70063740 ns 10 chunks=4 columns=1 items_per_second=14.966M/s null_percent=0
TableSortIndicesInt64Narrow/1048576/100/1/1 19336900 ns 19335762 ns 36 chunks=1 columns=1 items_per_second=54.2299M/s null_percent=1
TableSortIndicesInt64Narrow/1048576/4/1/1 25458573 ns 25457697 ns 27 chunks=1 columns=1 items_per_second=41.189M/s null_percent=25
TableSortIndicesInt64Narrow/1048576/0/1/1 17580368 ns 17577897 ns 40 chunks=1 columns=1 items_per_second=59.6531M/s null_percent=0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #36179: GH-36176: [C++] Fix regression for single-key Table sorting
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36179:
URL: https://github.com/apache/arrow/pull/36179#issuecomment-1599176618
This fixes the regression here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org