You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/19 07:54:08 UTC
[GitHub] [arrow] Dandandan edited a comment on pull request #8961: ARROW-10885: [Rust][DataFusion] Optimize hash join build vs probe order based on number of rows
Dandandan edited a comment on pull request #8961:
URL: https://github.com/apache/arrow/pull/8961#issuecomment-748437288
I checked merging the other PR https://github.com/apache/arrow/pull/8965 which improves the join implementation.
Besides being ~20-50x faster regardless of this PR, reordering gives a further ~15% reduction in time when reordering the following query (6001214 vs 1499999 rows)
```
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
lineitem
join
orders
on
l_orderkey = o_orderkey
group by
l_shipmode
order by
l_shipmode;"
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org