You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org> on 2017/03/30 01:59:08 UTC
[Impala-ASF-CR] IMPALA-4883: Union Codegen
Taras Bobrovytsky has uploaded a new patch set (#3).
Change subject: IMPALA-4883: Union Codegen
......................................................................
IMPALA-4883: Union Codegen
For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.
Testing:
Ran test_queries.py test locally in exchaustive mode.
Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.
SELECT
COUNT(c),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
union all
select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before: 39s704ms
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 194.504us 194.504us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 17.284us 17.284us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s202ms 2s934ms 3 1 115.00 KB 10.00 MB
00:UNION 3 32s514ms 34s926ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 158.373ms 216.085ms 28.80M 28.80M 489.71 MB 1.88 GB tpcds_10_parquet.store_sales
|--03:SCAN HDFS 3 167.002ms 171.738ms 28.80M 28.80M 489.74 MB 1.88 GB tpcds_10_parquet.store_sales
|--04:SCAN HDFS 3 125.331ms 145.496ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales
|--05:SCAN HDFS 3 148.478ms 194.311ms 28.80M 28.80M 489.69 MB 1.88 GB tpcds_10_parquet.store_sales
|--06:SCAN HDFS 3 143.995ms 162.781ms 28.80M 28.80M 489.57 MB 1.88 GB tpcds_10_parquet.store_sales
|--07:SCAN HDFS 3 169.731ms 250.201ms 28.80M 28.80M 489.58 MB 1.88 GB tpcds_10_parquet.store_sales
|--08:SCAN HDFS 3 164.110ms 254.374ms 28.80M 28.80M 489.61 MB 1.88 GB tpcds_10_parquet.store_sales
|--09:SCAN HDFS 3 135.631ms 162.117ms 28.80M 28.80M 489.63 MB 1.88 GB tpcds_10_parquet.store_sales
|--10:SCAN HDFS 3 138.736ms 167.778ms 28.80M 28.80M 489.67 MB 1.88 GB tpcds_10_parquet.store_sales
01:SCAN HDFS 3 202.015ms 248.728ms 28.80M 28.80M 489.68 MB 1.88 GB tpcds_10_parquet.store_sales
After: 20s664ms
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 167.757us 167.757us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 16.592us 16.592us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s924ms 3s715ms 3 1 115.00 KB 10.00 MB
00:UNION 3 4s971ms 6s082ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 1s189ms 1s588ms 28.80M 28.80M 483.82 MB 1.88 GB tpcds_10_parquet.store_sales
|--03:SCAN HDFS 3 1s117ms 1s157ms 28.80M 28.80M 484.85 MB 1.88 GB tpcds_10_parquet.store_sales
|--04:SCAN HDFS 3 1s226ms 1s454ms 28.80M 28.80M 483.00 MB 1.88 GB tpcds_10_parquet.store_sales
|--05:SCAN HDFS 3 1s141ms 1s380ms 28.80M 28.80M 486.90 MB 1.88 GB tpcds_10_parquet.store_sales
|--06:SCAN HDFS 3 1s180ms 1s292ms 28.80M 28.80M 488.22 MB 1.88 GB tpcds_10_parquet.store_sales
|--07:SCAN HDFS 3 1s227ms 1s391ms 28.80M 28.80M 481.99 MB 1.88 GB tpcds_10_parquet.store_sales
|--08:SCAN HDFS 3 1s169ms 1s277ms 28.80M 28.80M 483.67 MB 1.88 GB tpcds_10_parquet.store_sales
|--09:SCAN HDFS 3 1s192ms 1s366ms 28.80M 28.80M 484.92 MB 1.88 GB tpcds_10_parquet.store_sales
|--10:SCAN HDFS 3 1s189ms 1s298ms 28.80M 28.80M 487.83 MB 1.88 GB tpcds_10_parquet.store_sales
01:SCAN HDFS 3 1s239ms 1s314ms 28.80M 28.80M 481.33 MB 1.88 GB tpcds_10_parquet.store_sales
Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/union-node-ir.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
6 files changed, 150 insertions(+), 41 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6459/3
--
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>