You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jaylmiller (via GitHub)" <gi...@apache.org> on 2023/03/14 15:33:07 UTC
[GitHub] [arrow-datafusion] jaylmiller commented on pull request #5292: use row encoding for SortExec
jaylmiller commented on PR #5292:
URL: https://github.com/apache/arrow-datafusion/pull/5292#issuecomment-1468327766
Coding-wise everything is finished and code is ready to review. But in terms of bench results, I'm not 100% confident yet.
Sort micro-benchmarks are looking pretty good: significant improvements on cases where row encoding is actually used, minor regressions--mostly within error bars--on cases without row encoding but of course more experienced contributors would know better about how significant these regressions actually are (I'll repost them at the bottom):
```
group main-sort rows-sort
----- --------- ---------
sort f64 1.00 10.8±0.23ms ? ?/sec 1.04 11.2±0.93ms ? ?/sec
sort f64 preserve partitioning 1.00 4.0±0.27ms ? ?/sec 1.04 4.1±0.28ms ? ?/sec
sort i64 1.00 9.5±0.55ms ? ?/sec 1.09 10.3±0.74ms ? ?/sec
sort i64 preserve partitioning 1.00 3.3±0.10ms ? ?/sec 1.06 3.5±0.13ms ? ?/sec
sort mixed tuple 1.28 28.3±3.35ms ? ?/sec 1.00 22.2±1.60ms ? ?/sec
sort mixed tuple preserve partitioning 1.00 3.6±0.17ms ? ?/sec 1.15 4.1±1.09ms ? ?/sec
sort mixed utf8 dictionary tuple 2.84 52.7±8.27ms ? ?/sec 1.00 18.6±1.29ms ? ?/sec
sort mixed utf8 dictionary tuple preserve partitioning 1.02 4.2±0.92ms ? ?/sec 1.00 4.1±0.55ms ? ?/sec
sort utf8 dictionary 1.00 3.7±0.21ms ? ?/sec 1.04 3.9±0.33ms ? ?/sec
sort utf8 dictionary preserve partitioning 1.00 1487.2±1444.67µs ? ?/sec 1.01 1502.8±315.79µs ? ?/sec
sort utf8 dictionary tuple 3.26 57.0±11.35ms ? ?/sec 1.00 17.5±2.08ms ? ?/sec
sort utf8 dictionary tuple preserve partitioning 1.13 4.1±1.08ms ? ?/sec 1.00 3.6±0.52ms ? ?/sec
sort utf8 high cardinality 1.01 28.0±3.70ms ? ?/sec 1.00 27.6±3.81ms ? ?/sec
sort utf8 high cardinality preserve partitioning 1.00 11.1±1.48ms ? ?/sec 1.21 13.5±3.38ms ? ?/sec
sort utf8 low cardinality 1.00 15.3±5.08ms ? ?/sec 1.10 16.9±6.20ms ? ?/sec
sort utf8 low cardinality preserve partitioning 1.03 8.1±2.21ms ? ?/sec 1.00 7.8±1.75ms ? ?/sec
sort utf8 tuple 1.96 56.8±8.36ms ? ?/sec 1.00 29.0±4.82ms ? ?/sec
sort utf8 tuple preserve partitioning 1.02 6.7±0.95ms ? ?/sec 1.00 6.5±0.46ms ? ?/sec
```
In summary, I'd like to get an opinion on these micro bench results. And then also ideally, we can run the e2e bench comparisons (#5561) on `tpch` and `parquet` and get a bit more data on whether this change is worth merging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org