You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org> on 2017/03/30 01:59:08 UTC

[Impala-ASF-CR] IMPALA-4883: Union Codegen

Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: IMPALA-4883: Union Codegen
......................................................................

IMPALA-4883: Union Codegen

For each non-passthrough child of the Union node, codegen the loop that
does per row tuple materialization.

Testing:
Ran test_queries.py test locally in exchaustive mode.

Benchmark:
Ran a local benchmark on a local 10 GB TPCDS dataset on an unpartitioned
store_sales table.

SELECT
  COUNT(c),
  COUNT(ss_customer_sk),
  COUNT(ss_cdemo_sk),
  COUNT(ss_hdemo_sk),
  COUNT(ss_addr_sk),
  COUNT(ss_store_sk),
  COUNT(ss_promo_sk),
  COUNT(ss_ticket_number),
  COUNT(ss_quantity),
  COUNT(ss_wholesale_cost),
  COUNT(ss_list_price),
  COUNT(ss_sales_price),
  COUNT(ss_ext_discount_amt),
  COUNT(ss_ext_sales_price),
  COUNT(ss_ext_wholesale_cost),
  COUNT(ss_ext_list_price),
  COUNT(ss_ext_tax),
  COUNT(ss_coupon_amt),
  COUNT(ss_net_paid),
  COUNT(ss_net_paid_inc_tax),
  COUNT(ss_net_profit),
  COUNT(ss_sold_date_sk)
FROM (
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
  union all
  select fnv_hash(ss_sold_time_sk) c, * from tpcds_10_parquet.store_sales_unpartitioned
) t

Before: 39s704ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  194.504us  194.504us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   17.284us   17.284us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s202ms    2s934ms        3           1  115.00 KB       10.00 MB
00:UNION               3   32s514ms   34s926ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3  158.373ms  216.085ms   28.80M      28.80M  489.71 MB        1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3  167.002ms  171.738ms   28.80M      28.80M  489.74 MB        1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3  125.331ms  145.496ms   28.80M      28.80M  489.57 MB        1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3  148.478ms  194.311ms   28.80M      28.80M  489.69 MB        1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3  143.995ms  162.781ms   28.80M      28.80M  489.57 MB        1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3  169.731ms  250.201ms   28.80M      28.80M  489.58 MB        1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3  164.110ms  254.374ms   28.80M      28.80M  489.61 MB        1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3  135.631ms  162.117ms   28.80M      28.80M  489.63 MB        1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3  138.736ms  167.778ms   28.80M      28.80M  489.67 MB        1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3  202.015ms  248.728ms   28.80M      28.80M  489.68 MB        1.88 GB  tpcds_10_parquet.store_sales

After: 20s664ms
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE           1  167.757us  167.757us        1           1   28.00 KB        -1.00 B  FINALIZE
12:EXCHANGE            1   16.592us   16.592us        3           1          0        -1.00 B  UNPARTITIONED
11:AGGREGATE           3    2s924ms    3s715ms        3           1  115.00 KB       10.00 MB
00:UNION               3    4s971ms    6s082ms  288.01M     288.01M    3.08 MB              0
|--02:SCAN HDFS        3    1s189ms    1s588ms   28.80M      28.80M  483.82 MB        1.88 GB  tpcds_10_parquet.store_sales
|--03:SCAN HDFS        3    1s117ms    1s157ms   28.80M      28.80M  484.85 MB        1.88 GB  tpcds_10_parquet.store_sales
|--04:SCAN HDFS        3    1s226ms    1s454ms   28.80M      28.80M  483.00 MB        1.88 GB  tpcds_10_parquet.store_sales
|--05:SCAN HDFS        3    1s141ms    1s380ms   28.80M      28.80M  486.90 MB        1.88 GB  tpcds_10_parquet.store_sales
|--06:SCAN HDFS        3    1s180ms    1s292ms   28.80M      28.80M  488.22 MB        1.88 GB  tpcds_10_parquet.store_sales
|--07:SCAN HDFS        3    1s227ms    1s391ms   28.80M      28.80M  481.99 MB        1.88 GB  tpcds_10_parquet.store_sales
|--08:SCAN HDFS        3    1s169ms    1s277ms   28.80M      28.80M  483.67 MB        1.88 GB  tpcds_10_parquet.store_sales
|--09:SCAN HDFS        3    1s192ms    1s366ms   28.80M      28.80M  484.92 MB        1.88 GB  tpcds_10_parquet.store_sales
|--10:SCAN HDFS        3    1s189ms    1s298ms   28.80M      28.80M  487.83 MB        1.88 GB  tpcds_10_parquet.store_sales
01:SCAN HDFS           3    1s239ms    1s314ms   28.80M      28.80M  481.33 MB        1.88 GB  tpcds_10_parquet.store_sales

Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
A be/src/exec/union-node-ir.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
6 files changed, 150 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6459/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6459
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib4107d27582ff5416172810364a6e76d3d93c439
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>