You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/19 07:54:08 UTC

[GitHub] [arrow] Dandandan edited a comment on pull request #8961: ARROW-10885: [Rust][DataFusion] Optimize hash join build vs probe order based on number of rows

Dandandan edited a comment on pull request #8961:
URL: https://github.com/apache/arrow/pull/8961#issuecomment-748437288


   I checked merging the other PR https://github.com/apache/arrow/pull/8965 which improves the join implementation.
   
   Besides being ~20-50x faster regardless of this PR, reordering gives a further ~15% reduction in time when reordering the following query (6001214 vs 1499999 rows)
   
   ```
   select
                   l_shipmode,
                   sum(case
                       when o_orderpriority = '1-URGENT'
                           or o_orderpriority = '2-HIGH'
                           then 1
                       else 0
                   end) as high_line_count,
                   sum(case
                       when o_orderpriority <> '1-URGENT'
                           and o_orderpriority <> '2-HIGH'
                           then 1
                       else 0
                   end) as low_line_count
               from
                   lineitem
               join
                   orders
               on
                   l_orderkey = o_orderkey
               group by
                   l_shipmode
               order by
                   l_shipmode;"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org