You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org> on 2017/01/27 23:34:09 UTC
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Taras Bobrovytsky has uploaded a new change for review.
http://gerrit.cloudera.org:8080/5816
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
19 files changed, 417 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/1
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#9).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 726 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/9
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#17).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,469 insertions(+), 760 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/17
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#19).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 1,461 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/19
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 20:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state));
> Unfortunately that wouldn't work.
We could make it work but would require changing how row-batches work a little. I agree it's not worth it, so let's leave this alone.
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 129: return first_materialized_child_idx_ <= child_idx_ && child_idx_ < children_.size();
> Reordered it as you suggested. (By the way, in Python you can actually writ
The condition you have here doesn't tell "if there are still rows to be returned from children than need materialization". Your condition tells you whether we are currently processing children that need passthrough.
The condition that makes sense for this function is the one I wrote earlier, read the comment to see why:
// We have children that need materialization and haven't processed them all yet.
first_materizlied_child_idx_ != children_.size() && child_idx_ < children_.size()
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 14:
(13 comments)
http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 138: while (const_expr_list_idx_ < const_expr_lists_.size() && !row_batch->AtCapacity() && !ReachedLimit()) {
> long line
Done
Line 144: if (const_expr_list_idx_ == const_expr_lists_.size()) *eos = true;
> *eos = const_expr_list_idx_ == const_expr_lists_.size();
Done
Line 196: if (child_batch_.get() == NULL) {
> nullptr
Replaced all NULL with nullptr.
Line 210: // There are only 3 ways of getting out of this loop:
> We also break out if we fetch an empty child batch
Fixed. This is no longer the case.
Line 214: RETURN_IF_ERROR(QueryMaintenance(state));
> remove
Done. So we don't need query maintenance at all?
Line 274: if (const_todo_) {
> Not a big deal, but I think we should do the const exprs last because most
Done
Line 275: RETURN_IF_ERROR(GetNextConst(state, row_batch, &done));
> why not pass &const_todo_ directly, and same for the other cases below
Done, Changed the way this is handled.
Line 278: RETURN_IF_ERROR(GetNextPassThrough(state, row_batch, &done));
> It might be simpler (== less error prone) overall to order the operands bas
Done. Changed how this is handled now.
Line 282: child_idx_ = 0;
> The code here and the setting of child_idx_ in Open() is kind of subtle. Th
Done, I'm doing the sorting in the FE. The explain plan should reflect the order now too.
Line 286: RETURN_IF_ERROR(GetNextMaterialized(state, row_batch, &done));
> add a DCHECK here that child_eos_ is true if there were any passthrough chi
Not sure if that makes sense. The DCHECK would only be true on the first call to GetNextMaterialized. What are we trying to check exactly?
Line 290: *eos = ReachedLimit() || (!const_todo_ && !passthrough_todo_ && !materialize_todo_);
> Isn't the last condition the same as !materialized_todo_ since we are going
No, because we might have const_todo_ = true, passthrough_todo_ = true and materialize_todo_ = false at the very beginning. We set up these variables in Open().
http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 39: /// and expressions don't need to be evaluated. The UnionNode pulls row batches
> Comment says union goes through children sequentially, which makes to maint
Not 100% sure what you mean here, but I updated the comment.
Line 107: Status GetNextConst(RuntimeState* state, RowBatch* row_batch, bool* eos);
> Use a different name then eos, e.g. 'done' to avoid confusion with the real
Done.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#14).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,292 insertions(+), 607 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/14
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#17).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,468 insertions(+), 760 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/17
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Reviewed-on: http://gerrit.cloudera.org:8080/5816
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,461 insertions(+), 764 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Dan Hecht: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#3).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
19 files changed, 425 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/3
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
Patch Set 4:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children
I think this may be something to do with count(*) being non-nullable. Maybe the expr in the union node has a nullable slot in that place?
It would be good to understand the root cause so we're sure we fixed the underlying bug.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 21: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
> (2 comments)
>
> Dan, I don't think multiple row batches are necessary to exercise
> the close on next getnext call. Even if a child returns a single
> batch, that logic will be exericed.
That doesn't seem to be the case looking at PlanFragmentExecutor::ExecInternal(). Besides, if it is the case, that could easily change in the future, so I don't think it's good to rely on that for coverage.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 19:
(14 comments)
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 111: DCHECK_LT(child_idx_, children_.size());
> this DCHECK should come before any use of child_idx_ since if it's violated
Done
Line 114: DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
> what do these dchecks have to do with child_eos_? i.e. shouldn't they true
Yes, They should be true. I think the idea was to call this once per child. I actually think it's safer to call this every time because currently it does not get called for the first child (which gets opened in Open()).
Changed it so that it gets called every time.
PS19, Line 132: child_idx_++
> ++child_idx_ per our style.
Done
PS19, Line 147: child_idx_ < children_.size()
> Would make more sense as HasMoreMaterialized()
Done. Even though this does an extra unnecessary check (first_materialized_child_idx_ <= child_idx_), this is cleaner.
PS19, Line 150: children
> children that need materialization
Done
PS19, Line 153: Row
> Child row batch
Done
Line 168: // There are only 3 ways of getting out of this loop:
> it would be helpful to concisely say what this loop is responsible for give
Done
Line 193: COUNTER_SET(rows_returned_counter_, num_rows_returned_);
> should we do: child_batch_.reset() here? we shouldn't ever reference it aga
I see. So the invariant should be if child_batch references something (it's not reset), then it corresponds to an open materialized child.
Added reset() here. By the way, reset() will be called twice, once here and once in Close().
PS19, Line 200: // We end up here iff one of the following is true (or both).
: // 1. We are done consuming all batches from the current child and we need to move on
: // to the next child.
: // 2. The output row batch is at capacity.
> this could be summarized for a quicker read:
Changed the structure along the lines of what you suggested, and cleaned this up.
PS19, Line 204: In other words, the only way to not end up here if we entered the outer loop is if
: // the limit is reached.
> rather than say that in a comment, how about:
Done
Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state));
> shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret
Unfortunately that wouldn't work.
For passthrough, we simply forward as they are. Even if they are partially filled, we just forward them. That's kind of the point of this patch.
Maybe we could put materialized and const in a while loop, but that would make the code messier and the benefit would be minimal (up to 1 fewer partially filled row batch per union node per entire cluster).
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:
PS19, Line 115: child_idx
> 'child_idx'
Done
Line 129: return child_idx_ >= first_materialized_child_idx_ && child_idx_ < children_.size();
> this suggestion is okay to ignore, but i find these kind of conditions easi
Reordered it as you suggested. (By the way, in Python you can actually write "w < x <= y < z")
first_materialized_child_idx_ points to the first materialized child. So everything after it is materialized. that means if child_idx_ is greater or equal to materialized_child_idx_ then it's materialized.
It's the exact opposite of passthrough (where we check if child_idx_ is less than).
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:
Line 94: DCHECK(false);
> this is confusing. either it's an invariant or it's not. it's even more con
Removed.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#17).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,469 insertions(+), 760 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/17
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#18).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,467 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/18
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 17:
(8 comments)
http://gerrit.cloudera.org:8080/#/c/5816/17//COMMIT_MSG
Commit Message:
Line 19: A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
> No more query option.
Done
http://gerrit.cloudera.org:8080/#/c/5816/17/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 32: class RuntimeState;
> remove
Done
Line 70: /// evaluating its exprs.
> Just or clarity it might help to spell out what its value is when all child
Done
Line 71: int first_materialized_child_idx_;
> make const and set in c'tor
Done
Line 100: /// call on the child. Sets 'passthrough_todo_' to false when all passthrough children
> update comments to not mention 'todo'
Good catch, done.
http://gerrit.cloudera.org:8080/#/c/5816/17/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 77: // Index of the first non-passthrough child; i.e. a child that needs materializing and
> Shrink to: // Index of the first non-passthrough child.
Done
Line 183: * Re-order the union's operands such that the passthrough operands come before the
> I'd say the main purpose of this function is to compute passthrough. The re
Done
Line 190: isChildPassthrough.add(computePassthrough(
> You could rename this computePassthrough() to isChildPassthrough()
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#8).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 729 insertions(+), 52 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/8
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#18).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,467 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/18
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#18).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,466 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/18
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#20).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,460 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/20
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 21: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#13).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.h
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
34 files changed, 740 insertions(+), 62 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/13
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
Patch Set 5:
(42 comments)
http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG
Commit Message:
Line 14: Testing:
Would be nice to get some idea of the perf improvement. The JIRA has an interesting query. A small microbenchmark would also be useful.
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 102
There was a specific reason why Open() was called here. There is an expectation that GetNext() returns quickly after Open() is called. This expectation has to do with our client/server interaction and the query state transitions. The query goes into the FINISHED state once Open() succeeded on the coordinator fragment.
Does test_rows_availability.py succeed?
Line 62: pass_through_children_ = tnode.union_node.pass_through;
move to initializer list in cosntructor
Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state);
Needs comment
Line 124: if (child_idx_ < children_.size() && isPassThrough(child_idx_)) {
High-level comment what is happening here (passthrough).
Line 140: if (child_eos_) {
Add a comment why it's not ok to Close() the child in this passthrough mode even if the child is at eos.
In the meterialization case below, we can and do close the child as soon as possible.
Line 141: row_batch->MarkNeedsDeepCopy();
Needs comment
Line 150: if (child_idx_ < children_.size() ||
Might as well reverse this check and move it up, set eos to true and return OK. Easier to see that we're just going to skip the remaining code.
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 36: /// Node that merges the results of its children by materializing their
update comment
Line 79: std::vector<bool> pass_through_children_;
is_child_passthrough_?
The existing variable name makes it sound like it is a list of children that are pass through, but there is actually an entry for all children.
Move this member out of the "Members that need to be reset()" section.
Line 100: inline bool isPassThrough(int idx) {
idx -> child_idx
const method
Line 101: DCHECK(idx < pass_through_children_.size());
DCHECK_LT
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:
Line 132: if (this->type() != other_desc.type()) return false;
can get rid of this->
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 554: /// Comparison is done by the contents of the tuple descriptors and not the ids.
I'd prefer to preserve the meaning of these existing functions (IsPrefixOf() and Equals(). We have several interesting DCHECKs that require the ids (and not just the physical layout) to be identical.
If you really need these functions we should give them new names kind of like we have in the FE for this check.
http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:
Line 433: // List of booleans that indicates which children can be passed through
Remove "List of booleans" since that's redundant
Line 435: 4: required list<bool> pass_through
is_child_passthrough (good to keep names consistent)
http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:
Line 222:
Would be good to keep the name of this function and the BE equivalent the same.
Line 223: public boolean hasEqualPhysicalLayout(SlotDescriptor other) {
needs brief comment
Line 224: if (!this.getType().matchesType(other.getType())) return false;
shouldn't the types be equal()?
Line 226: if (this.getByteSize() != other.getByteSize()) return false;
can remove 'this'
http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:
Line 155: // List of output expressions of the Union node. This should be the same as result
List of output expressions produced by the union without the ORDER BY portion (if any). Same as resultExprs_ if there is no ORDER BY.
Line 156: // resultExprs_ if the UnionStmt does not have an Order By. Otherwise resultExprs_
Let's avoid referring to specific plan nodes at this stage and instead try to describe 'semantically' what these exprs contain.
Line 158: private List<Expr> unionNodeResultExprs_ = Lists.newArrayList();
unionResultExprs_
Line 191: for (Expr e: other.unionNodeResultExprs_) unionNodeResultExprs_.add(e.clone());
use Expr.cloneList()
Line 501: for (UnionOperand op: operands_) {
combine with the loop over operands_ in L526
http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 46: * the corresponding result exprs into a new tuple.
update comment to reflect passthrough capability
Line 56: // List of output expressions of the Union node.
List of union result exprs of the originating UnionStmt. Used for determining passthrough-compatibility of children.
No need to new this list.
final
Move this above resultExprLists_ to hopefully minimize confusion by visual separation.
Line 69: protected boolean passThroughEnabled = true;
Seems clearer to make this isInSubplan_ or something and pass that in the constructor as well. The class comment should describe the passthrough capability and why we don't want it inside a subplan.
Line 72: protected List<Boolean> passThrough_ = Lists.newArrayList();
isChildPassthrough (at least something that is consistent across FE/BE and thrift)
Line 76: protected UnionNode(PlanNodeId id, TupleId tupleId) {
remove?
Line 263: TupleDescriptor this_tuple_desc = analyzer.getDescTbl().getTupleDesc(tupleId_);
use FE camel-case style
Line 281: msg.union_node = new TUnionNode(
add Preconditions check that asserts the correct size of passThrough
Line 299: if (!passThrough_.isEmpty()) {
Isn't this always non-empty?
Line 300: List<String> passThroughNodes = Lists.newArrayList();
passThroughNodeIds
Line 308: Joiner.on(", ").join(passThroughNodes) + "\n");
nit: we usually don't add a spaces after the comma for explain plan stuff
Line 309: }
might be nice to produce "all" instead of listing all plan node ids if all children are passthrough
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:
Line 187: | pass through nodes: 01, 02
pass-through-operands:
(to be consistent with our existing "constant-operands")
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/empty.test
File testdata/workloads/functional-planner/queries/PlannerTest/empty.test:
Line 506: # IMPALA-3586: Verify that Union pass through is disabled in subplans.
Might be good to add a separate test for this, since this one is kind of weird for it (unions with only a single operand)
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3085: | partitions=4/4 files=4 size=460B
Add a new Kudu planner test (kudu.test) for:
select * from functional.alltypes
union all
select * from functional_kudu.alltypes
The operand with the Kudu scan cannot be passed through.
However if both operands are Kudu scans, then they can be passed through.
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children
No need to describe the failure mode of that specific bug you hit during development.
Better to describe what case this test is covering: Input tuples that have non-nullable slots.
I believe that this should now do passthrough right?
Line 1082: # IMPALA-3586: Test the case where no nodes are passed though.
Is this not already covered?
Line 1093: # IMPALA-3586: Test the case where 1 node is passed though, and one is not.
Not already covered?
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#6).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
- TODO: run a performance benchmark.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
30 files changed, 683 insertions(+), 49 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/6
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#18).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 1,462 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/18
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#3).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
19 files changed, 425 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/3
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 148: row_batch->MarkNeedsDeepCopy();
> this doesn't make sense. it only marks the last batch as needing to be copi
The problematic memory is memory that is never attached to any batch and is freed when the child is closed. We don't have a better way to deal with this for now.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#18).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,467 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/18
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 14:
(13 comments)
The new code is much clearer! I think we can still improve it further though. Happy to go over the subtleties in person if you prefer.
http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 138: while (const_expr_list_idx_ < const_expr_lists_.size() && !row_batch->AtCapacity() && !ReachedLimit()) {
long line
Line 144: if (const_expr_list_idx_ == const_expr_lists_.size()) *eos = true;
*eos = const_expr_list_idx_ == const_expr_lists_.size();
Line 196: if (child_batch_.get() == NULL) {
nullptr
Line 210: // There are only 3 ways of getting out of this loop:
We also break out if we fetch an empty child batch
Line 214: RETURN_IF_ERROR(QueryMaintenance(state));
remove
Line 274: if (const_todo_) {
Not a big deal, but I think we should do the const exprs last because most of the time this branch will not be taken.
Line 275: RETURN_IF_ERROR(GetNextConst(state, row_batch, &done));
why not pass &const_todo_ directly, and same for the other cases below
Line 278: RETURN_IF_ERROR(GetNextPassThrough(state, row_batch, &done));
It might be simpler (== less error prone) overall to order the operands based on passthrough, or create two lists of child indexes (passthrough and non-passthrough), populated in Init().
Line 282: child_idx_ = 0;
The code here and the setting of child_idx_ in Open() is kind of subtle. There's a baked in assumption that passthrough is done before materialized, but the code that makes that work is spread in different places. I think using two separate child lists, or ordering the operands would help clarify this.
I'm ok with ordering or separating, but I do have a slight preference for ordering because then the evaluation order is clear even at the plan level. Otherwise, you need to understand the union implementation in detail to know how it gets evaluated.
Line 286: RETURN_IF_ERROR(GetNextMaterialized(state, row_batch, &done));
add a DCHECK here that child_eos_ is true if there were any passthrough children
Line 290: *eos = ReachedLimit() || (!const_todo_ && !passthrough_todo_ && !materialize_todo_);
Isn't the last condition the same as !materialized_todo_ since we are going over the cases strictly in order?
http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 39: /// and expressions don't need to be evaluated. The UnionNode pulls row batches
Comment says union goes through children sequentially, which makes to maintain imo.
Line 107: Status GetNextConst(RuntimeState* state, RowBatch* row_batch, bool* eos);
Use a different name then eos, e.g. 'done' to avoid confusion with the real eos.
We might not need the param at all if we order the children or separate the child lists into passthrough and non-passthrough
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 16:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:
Line 15: handle all passthrough children before non-passthrough children in the
> I don't think it's unusual to have "feature flags" for disabling invasive c
Done. Removed the flag. Also discussed with Alex in person about testing this with patch with the query generator after this patch is checked in.
http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 104: // Figure out which the sources from which we will need to fetch rows. This is being
> Figure out which -> Determine
Not relevant any more.
http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 91: bool const_todo_;
> Rather than having these variables, how about just saving the index of the
Followed Dan's suggestion and removed state.
Line 107: /// call on the child. Sets "passthrough_todo_" to false when all passthrough children
> we typically use single quotes
Done
Line 109: Status GetNextPassThrough(RuntimeState* state, RowBatch* row_batch);
> I liked your original idea of passing a 'has_more' output parameter better.
No longer relevant because those members were removed.
http://gerrit.cloudera.org:8080/#/c/5816/16/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 215: isChildPassthrough_.clear();
> PlanNode.init() generally needs to be idempotent because it could be called
After thinking about it some more, I don't think calling init() should have any effect on unionResultExprs_. The order of slots in unionResultExprs_ should not be affected by reordering the children.
Line 221: // Order the children, such that all passthrough children come before the
> move this reordering into a private helper function
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#7).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 730 insertions(+), 53 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/7
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#6).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
- TODO: run a performance benchmark.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
30 files changed, 683 insertions(+), 49 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/6
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 13:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 156: // It is OK to close the child here because all RowBatches have already been
> ... because all tuple data has been copied, and we will not be calling ...
changed the code significantly, This come was removed.
Line 174: child_batch_.reset();
> move this before ++child_idx_ to have them clustered?
Code changed, this is not relevant any more.
http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: /// Children that can be passed through, without evaluating and materializing their
> single line
Done
Line 109: inline bool AtEos(int per_fragment_instance_idx) const {
> Maybe it's clearer to make GetNextPassThrough() not set eos, and instead ha
Changed the code completely.
Line 119: DCHECK_LE(child_idx_, children_.size());
> Checking of the child_idx_ is dependent on the aller pattern of advancing t
Changed the code completely
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#11).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 718 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/11
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(11 comments)
http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:
Line 15: as a precaution and for testing purposes.
is this really needed? the more query options we have the larger the test matrix. this one's not so bad since the fallback code is needed anyway (when the layout isn't the same), but still wondering what the cost/benefit of this option is.
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:
PS9, Line 196: IsPrefixOfEquivalentLayout
why is this the right check rather than IsPrefixOf()?
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
PS9, Line 101: Ensures that rows are available for clients to fetch after this Open() has
: // succeeded.
what does this comment mean now that we don't do GetNext() here?
Line 130: // this)
i don't understand this comment given the dcheck on the next line, which is checking that the row batch is empty.
PS9, Line 147: next GetNext() call
is there guaranteed to be another GetNext() call?
Line 148: row_batch->MarkNeedsDeepCopy();
this doesn't make sense. it only marks the last batch as needing to be copied, by why is the last batch special?
i think we should really be using RowBatch::AcquireState() to cheaply generate a row batch that will be unaffected by the state of the child.
Line 151: return Status::OK();
GetNext() is quite long. how about moving this code into GetNextPassThrough()
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.h
File be/src/exec/union-node.h:
PS9, Line 99: isChildPassThrough
IsChildPassThrough
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:
Line 478: }
how about just calling prefix routine rather than duplicating this?
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 104: }
do we need these overloads? (i.e. is this now used in stl)? if not, we prefer to avoid operator overloading since it's less explicit, so how about just defining Equals() on this class.
PS9, Line 562: IsPrefixOfEquivalentLayout
this name is hard to understand because the object of "of" should be the other_desc, not "equivalent layout". Also, the "equivalent" seems contrary to "prefix", i.e. this does not test for equality, but it is a test of prefix.
So how about these names:
Equals() // logical equality
LayoutEquals() // physical equality
IsPrefixOf() // logical prefix
LayoutIsPrefixOf() // physical prefix
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 13:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 156: // It is OK to close the child here because all RowBatches have already been
... because all tuple data has been copied, and we will not be calling ...
Line 174: child_batch_.reset();
move this before ++child_idx_ to have them clustered?
http://gerrit.cloudera.org:8080/#/c/5816/13/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: /// Children that can be passed through, without evaluating and materializing their
single line
Line 109: inline bool AtEos(int per_fragment_instance_idx) const {
Maybe it's clearer to make GetNextPassThrough() not set eos, and instead have all the *eos setting be done directly inside GetNext(). That way each place different place can do the appropriate checks.
Line 119: DCHECK_LE(child_idx_, children_.size());
Checking of the child_idx_ is dependent on the aller pattern of advancing the child_idx_ and calling into this AtEos(), another argument for maybe inlining the checks into GetNext() to make this subtle point less disconnected.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
Patch Set 3:
(6 comments)
http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
PS2, Line 122: IsInSubplan()
> I think we would either pass child_idx_ into isPassThrough() or call it som
Done
Line 123:
> Move this into the Open() branch so we don't execute it unnecessarily.
Done
Line 126: DCHECK(!IsInSubplan());
> We need to think about the implications of this. This could increase resour
Done. Disabled passthrough in subplans. The last row batch of each child is marked with the DeepCopy flag.
http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:
PS2, Line 438: Verify
> Verify (caps)
Done
PS2, Line 442: >
> Make these references with & to avoid copying the vector.
Done
PS2, Line 442: const
> Don't need std::, we automatically import it in common/names.h.
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 18:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/5816/18/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 61: /// Index of the first non-passthrough child; i.e. a child that needs materializing and
> i.e. a child that needs materialization (remove the "and evaluating its exp
Done
Line 62: /// evaluating its exprs. When all children are materialized, this should be zero. When
> Shrink like this:
Done
http://gerrit.cloudera.org:8080/#/c/5816/18/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 180: * children come before the children that need to be materialized and evaluated. Also
> remove "and evaluated"
Done
Line 181: * reorders 'resultExprLists_'. This is done in order to simplify the implementation in
> Instead of 'This' be explicit since the reference could be misunderstood. S
Done
http://gerrit.cloudera.org:8080/#/c/5816/18/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:
Line 80: query_string = ("select count(c) from ( "
> Why not add this at the end of union.test? Seems odd to have a single naked
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:
Line 15: as a precaution and for testing purposes.
> I agree that "blindly" adding query options is bad, but there are so many t
if we're worried about bugs, we should add more tests. i'm not in favor of piling on query options as a work-around for missing test coverage (and i don't think we're talking about "testing to death").
query options make the product harder to use. and we do not guarantee that for every query we're able to run at the moment that the runtime behavior (including memory consumption) will never change in the future. that would be unreasonable.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
> > (2 comments)
> >
> > Dan, I don't think multiple row batches are necessary to exercise
> > the close on next getnext call. Even if a child returns a single
> > batch, that logic will be exericed.
>
> That doesn't seem to be the case looking at PlanFragmentExecutor::ExecInternal().
> Besides, if it is the case, that could easily change in the
> future, so I don't think it's good to rely on that for coverage.
Hmm, I guess it's true for union-node because we never set *eos=true when returning pass through rows, unless there is a limit. That seems weird -- is that intentional?
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#7).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 742 insertions(+), 53 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/7
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 8:
(7 comments)
http://gerrit.cloudera.org:8080/#/c/5816/8/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: // Which children can be passed through, without without evaluating and materializing
> duplicate 'without'
Done
http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:
Line 225: // Indicates whether passthrough should be disabled in union nodes.
> enabled
Done
http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:
PS8, Line 255: d
> enabled
Done
http://gerrit.cloudera.org:8080/#/c/5816/8/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 267: // Check that if the child outputs a single tuple, then it's not nullable. Tuple
> remove (this was moved inside computePassThrough
Done
http://gerrit.cloudera.org:8080/#/c/5816/8/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1059: select bigint_col from functional.alltypestiny where bigint_col > 0
> use unqualified table names everywhere (and fix the one above while you are
Done
Line 1124: ---- QUERY
> Move this test into nested-types-subplan.test, otherwise this test will fai
Done
Line 1126: select count(c.c_custkey), COUNT(v.tot_price)
> lowercase count
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#8).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 729 insertions(+), 52 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/8
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 10:
(2 comments)
Dan, I don't think multiple row batches are necessary to exercise the close on next getnext call. Even if a child returns a single batch, that logic will be exericed.
http://gerrit.cloudera.org:8080/#/c/5816/10/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 124: DCHECK(child_idx_ >= children_.size() || !IsChildPassThrough(child_idx_));
> this dcheck isn't helpful given the inverse is used as the if-condition onl
Done
PS10, Line 228: this
> Remove this and clean up the child Close() logic as part of...
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#4).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
20 files changed, 435 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 18:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/5816/18/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 61: /// Index of the first non-passthrough child; i.e. a child that needs materializing and
i.e. a child that needs materialization (remove the "and evaluating its exprs" part)
Line 62: /// evaluating its exprs. When all children are materialized, this should be zero. When
Shrink like this:
/// evaluating its exprs.
/// 0 when all children are materialized
/// 'children_.size()' when no children are materialized
http://gerrit.cloudera.org:8080/#/c/5816/18/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 180: * children come before the children that need to be materialized and evaluated. Also
remove "and evaluated"
Line 181: * reorders 'resultExprLists_'. This is done in order to simplify the implementation in
Instead of 'This' be explicit since the reference could be misunderstood. Say "The children are reordered to simplify ..."
http://gerrit.cloudera.org:8080/#/c/5816/18/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:
Line 80: query_string = ("select count(c) from ( "
Why not add this at the end of union.test? Seems odd to have a single naked test case without a comment here.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 7:
(40 comments)
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:
Line 145: DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc()));
> This should be IsPrefixOf() because we sanity checking the row descriptors
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 115: // passthrough case, the child can be closed right away because the row batch received
> the child can be closed right away -> the child was already closed?
Done
Line 116: // from the child is copied (more details below).
> accuracy: the row batch wasn't copied
Done
Line 121: if (child_idx_ < children_.size() && isChildPassThrough(child_idx_)) {
> Suggest comment:
Done
Line 122: // If the rows from the current child can be passed through, the physical row layout
> This comment doesn't seem to add anything, I suggest removing it.
Replaced this with your suggestion.
Line 131: // It may be possible that the row batch is not empty, so we save the previous value.
> More details would be helpful. If the batch has rows at this point, I'd ima
Added a dcheck instead. Time made this suggestion in patch 4.
Line 148: // 'needs_deep_copy' lets us safely close the child in the next GetNext() call.
> style: 'needs_deep_copy' is not a visible variable here, suggest just sayin
Done
Line 154:
> DCHECK that child_idx_ is not passthrough here
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: // Which children can be passed through, without being materialized.
> ... without evaluating and materializing their exprs.
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 412: /// Return true if the physical layout of this descriptor matches the physical layout
> matches that of other_desc
Done
Line 413: /// of other_desc, but not necessarily ids.
> bot not necessarily the id.
Done
Line 565: /// of other_desc, but not necessarily ids.
> but not necessarily the ids
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h
File be/src/service/query-options.h:
Line 38: TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\
> I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a lit
We have both positive and negative like DISABLE_CODEGEN and ENABLE_EXPR_REWRITES. I agree that ENABLE is simpler and easier to think about. (We should rename all DISABLE options.)
http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:
Line 227: public boolean isEquivalent(SlotDescriptor other) {
> Unfortunately, the term 'equivalent' already has a different meaning in the
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:
Line 616: public List<Expr> getUnionNodeResultExprs() {
> getUnionResultExprs()
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 47: * a child has an identical tuple layout as the output of the union node, the
> ... as the output of the union node, and the child only has naked SlotRefs
Done
Line 57: protected final List<Expr> resultExprs_;
> unionResultExprs_ to make distinguish it from the resultExprLists_ and such
Done
Line 73: // If false, no batches from child nodes would be passed through in the backend.
> Comment should describe what this flag is. Also you mean "true" right?
Done
Line 76: // Indicates for which child nodes batches can be passed through in the backend.
> remove "in the backend" (it's clear that execution happens there)
Done
Line 81: protected UnionNode(PlanNodeId id, TupleId tupleId) {
> Is this c'tor still needed? If not, remove.
Yes, it's used if we are creating a constant node. (with no children)
Line 89: List<Expr> resultExprs, boolean isInSubplan) {
> indentation, unionResultExprs
Done
Line 182: * Compute whether we can pass through rows without materializing for the given child.
> Can combine/simplify like this:
Done. I don't think it's necessary to describe the passthrough conditions here. The implementation makes it clear.
Line 189: Analyzer analyzer, List<TupleId> childTupleIds, List<Expr> childExprList) {
> childExprList -> childResultExprs
Done
Line 190: if (analyzer.getQueryOptions().isDisable_union_passthrough()) return false;
> seems clearer to move this into init() and not execute any of the passthrou
We need to construct a list of booleans to indicate if the child can be passed through. We would have to then construct a list of false in init() if passthrough is disabled. I think it's simpler if we keep it the way it is.
Line 193: // TODO: Remove this as part of IMPALA-4179.
> Move TODO to the class comment
This TODO seems a little out of place in the class comment. Won't we have to give additional information there for this comment to make sense.
Line 194: if (isInSubplan_) return false;
> same here, seems easier to move this check into init()
Same as above. It's weird to special case adding a false to the list.
Line 205: // Verify that the union node has one slot for every expression.
> union node -> union tuple descriptor
Done
Line 212: if (resultExprs_.size() != childTupleDescriptor.getSlots().size()) return false;
> I don't think this tricky check is correct because it won't allow passthrou
Created a JIRA for handling advanced passthrough cases.
Line 218: SlotRef unionSlot = resultExprs_.get(i).unwrapSlotRef(false);
> unionSlotRef, childSlotRef
Done
Line 220: if (!unionTupleDescriptor.getSlots().get(i).isMaterialized()) continue;
> move above the unwrapSlotRef() calls
Done
Line 221: Preconditions.checkState(unionSlot.getDesc().getParent().getId().equals(tupleId_));
> Don't think we need this check, but something like this would be good:
Done
Line 223: Preconditions.checkState(!(childSlot instanceof SlotRef));
> No need for this check
Done
Line 262: // Compute which nodes can be passed through.
> which child nodes
Done
Line 266: // Check that if the child outputs a single tuple, then it's not nullable. Tuple
> move into computePassThrough
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:
Line 329: # IMPALA-3586: The operand with the Kudu scan cannot be passed through because it's not
> because id is not-nullable (primary key)
Done
Line 346: select id from functional_kudu.alltypes
> do select *
With select *, passthrough doesn't get enabled. The layout of the union tuple is different that the layout of the child tuples.
http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3104: # IMPALA-3678: Both union operands should produce rows with non-nullable slots which
> remove "should"
Done
Line 3124: # IMPALA-3678: The Union operands that contain a join should not be passed through,
> nice
thanks!
Line 3184: select COUNT(c.c_custkey), COUNT(v.tot_price)
> lowercase count for consistency
Done
http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
> lowercase count for consistency
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 16:
Forgot to update planner tests in patch 15.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 4:
(6 comments)
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 133: row_batch->set_num_rows(limit_ - num_rows_returned_);
> Done. I added saving the number of rows. I am not sure that this case is po
Do we have coverage of that kind of plan shape though? If someone else comes along and turns on passthrough in subplans.
http://gerrit.cloudera.org:8080/#/c/5816/6/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 103:
I think we need to reset 'child_batch_' here so that it has the correct 'row_desc' - otherwise they can be out of sync after a sequence of Reset() then Open().
Line 114: }
My preference is to remove this code if it's not needed for correctness and we have no specific reason to think that it's important for performance.
Line 153: RETURN_IF_ERROR(
I think this comment over-explains some resource management things that aren't specific to this code - the MarkNeedsDeepCopy() mechanism and what the non-passthrough case does. We could shorten to something like:
// Even though the child is at eos, it's not OK to Close() it here. Once we close the child,
// the row batches that it produced are invalid. Setting 'needs_deep_copy' lets us safely
// close the child in the next GetNext() call.
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 266: analyzer, children_.get(i).getTupleIds(), resultExprLists_.get(i)));
> Not exactly sure what you mean. We are guaranteed that the layout will be i
nullableTupleIds_ in child PlanNode is essentially part of the row layout and this code isn't checking that those are equivalent. I think currently all single-tuple rows have no nullable tuples, but I think it's too subtle to implicitly assume that. Maybe add a precondition check that the input tuple isn't nullable?
http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3103: ====
> Can you add a couple of tests where there is a table scan on one branch of
Was this addressed?
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
> The same thing can happen in the
> non-passthrough case though (without my patch).
Is that true? Won't we break out of the loops and fall through to the final code that sets *eos=true as long as there is still capacity in the row batch?
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#16).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,480 insertions(+), 758 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/16
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#14).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,330 insertions(+), 607 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/14
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(12 comments)
http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:
Line 15: as a precaution and for testing purposes.
> is this really needed? the more query options we have the larger the test m
We made this decision with Alex in person a while ago. There are 2 reasons:
1. In the very unlikely event that this patch introduces a bug for some customers, the query option is a workaround.
2. We will lose existing test coverage (in most of our existing union tests, the union node will be acting as a pass through node). This option allows us to run tests in both passthrough and non-passthrough mode. See tests/query_test/test_queries.py in this patch.
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/exchange-node.cc
File be/src/exec/exchange-node.cc:
PS9, Line 196: IsPrefixOfEquivalentLayout
> why is this the right check rather than IsPrefixOf()?
We just want to know if the layout of the received batch is acceptable. The isPrefixOf would fail here if the child is a union node and is passing through a batch from it's child.
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
PS9, Line 101: Ensures that rows are available for clients to fetch after this Open() has
: // succeeded.
> what does this comment mean now that we don't do GetNext() here?
This means the same thing. We open the first child, which means that fetching from the first child in get next should be fast. (which means that fetching from this union node should be fast).
Line 130: // this)
> i don't understand this comment given the dcheck on the next line, which is
Yeah, I think it makes sense to remove it.
PS9, Line 147: next GetNext() call
> is there guaranteed to be another GetNext() call?
Yes I think so. But even if there is no get next, all children get closed in UnionNode::Close() anyways.
Line 148: row_batch->MarkNeedsDeepCopy();
> Yeah IMPALA-4179 covers that. IMPALA-4179 lists a few examples. One of the
Added a todo
Line 151: return Status::OK();
> GetNext() is quite long. how about moving this code into GetNextPassThroug
Done
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: // Which children can be passed through, without evaluating and materializing their
> single line, 3 slashes
Done
PS9, Line 99: isChildPassThrough
> IsChildPassThrough
Done
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:
Line 478: }
> how about just calling prefix routine rather than duplicating this?
Done
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 104: }
> do we need these overloads? (i.e. is this now used in stl)? if not, we pre
Done
PS9, Line 562: IsPrefixOfEquivalentLayout
> this name is hard to understand because the object of "of" should be the ot
Done
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#13).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.h
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
34 files changed, 740 insertions(+), 62 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/13
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#14).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,331 insertions(+), 607 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/14
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#4).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
20 files changed, 435 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9: Code-Review+1
(1 comment)
FE changes lgtm.
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: // Which children can be passed through, without evaluating and materializing their
single line, 3 slashes
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#9).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 726 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/9
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#15).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,343 insertions(+), 611 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/15
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#16).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,480 insertions(+), 758 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/16
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
> No, we do not break out of the loop, we return right away (see line
> 149 in the original union-node.cc). The next getnext call can
> return 0 rows.
I'm talking about the case where we don't hit the limit. We normally won't take the return at line 149 (unless we happen to fill the row batch at the same time as finishing the children, but that's not the usual case). Instead, if we still have room after evaluating the children (and constant exprs), we'll get all the way to 192. So, the normal non-limit case is inconsistently handled. I agree it's not a functional bug, though.
It'd still be nice to exercise the multiple child row-batch path, if we're not already. Hopefully we already are (row-batch-size might be a exec option dimension in pytest), but I just thought it'd be good to check that.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 12:
As Dan pointed out, we don't set eos in the passthrough case, which is a little weird because there will be an unnecessary call to getnext() at the end which will return 0 rows. I fixed this issue in patch 12.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
Yes, that's true that we never set eos=true in the passthrough case if there is no limit. Do you think it's weird because the last getnext call might return 0 rows? The same thing can happen in the non-passthrough case though (without my patch).
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 7:
(40 comments)
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/analytic-eval-node.cc
File be/src/exec/analytic-eval-node.cc:
Line 145: DCHECK(child(0)->row_desc().IsPrefixOfEquivalent(row_desc()));
This should be IsPrefixOf() because we sanity checking the row descriptors of exec nodes (and not row batches).
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 115: // passthrough case, the child can be closed right away because the row batch received
the child can be closed right away -> the child was already closed?
Line 116: // from the child is copied (more details below).
accuracy: the row batch wasn't copied
Line 121: if (child_idx_ < children_.size() && isChildPassThrough(child_idx_)) {
Suggest comment:
// Handle passthrough children. We pass 'row_batch' directly into the GetNext() call on the child.
Line 122: // If the rows from the current child can be passed through, the physical row layout
This comment doesn't seem to add anything, I suggest removing it.
Line 131: // It may be possible that the row batch is not empty, so we save the previous value.
More details would be helpful. If the batch has rows at this point, I'd imagine it can cause all sorts of other problems. How can the batch already have rows?
Line 148: // 'needs_deep_copy' lets us safely close the child in the next GetNext() call.
style: 'needs_deep_copy' is not a visible variable here, suggest just saying "Marking the batch as needing a deep copy let's us ...
Line 154:
DCHECK that child_idx_ is not passthrough here
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: // Which children can be passed through, without being materialized.
... without evaluating and materializing their exprs.
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 412: /// Return true if the physical layout of this descriptor matches the physical layout
matches that of other_desc
Line 413: /// of other_desc, but not necessarily ids.
bot not necessarily the id.
Line 565: /// of other_desc, but not necessarily ids.
but not necessarily the ids
http://gerrit.cloudera.org:8080/#/c/5816/7/be/src/service/query-options.h
File be/src/service/query-options.h:
Line 38: TImpalaQueryOptions::DISABLE_UNION_PASSTHROUGH + 1);\
I tend to prefer ENABLE_UNION_PASSTHROUGH. To me positive phrasing is a little easier to understand. What do you think?
http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:
Line 227: public boolean isEquivalent(SlotDescriptor other) {
Unfortunately, the term 'equivalent' already has a different meaning in the FE for slots, so it would be good to the existing term fro this new one. Maybe isLayoutEquivalent()?
http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:
Line 616: public List<Expr> getUnionNodeResultExprs() {
getUnionResultExprs()
http://gerrit.cloudera.org:8080/#/c/5816/7/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 47: * a child has an identical tuple layout as the output of the union node, the
... as the output of the union node, and the child only has naked SlotRefs as result exprs, then the child is marked as 'passthrough'. The rows of passthrough children are directly returned by the union node, instead of materializing the child's result exprs into new tuples.
Line 57: protected final List<Expr> resultExprs_;
unionResultExprs_ to make distinguish it from the resultExprLists_ and such
Line 73: // If false, no batches from child nodes would be passed through in the backend.
Comment should describe what this flag is. Also you mean "true" right?
Line 76: // Indicates for which child nodes batches can be passed through in the backend.
remove "in the backend" (it's clear that execution happens there)
Line 81: protected UnionNode(PlanNodeId id, TupleId tupleId) {
Is this c'tor still needed? If not, remove.
Line 89: List<Expr> resultExprs, boolean isInSubplan) {
indentation, unionResultExprs
Line 182: * Compute whether we can pass through rows without materializing for the given child.
Can combine/simplify like this:
Returns true if rows from the child with 'childTupleIds' and 'childResultExprs' can be returned directly by the union node (without materialization into a new tuple).
Might be good to list the conditions for passthrough compatibility.
Line 189: Analyzer analyzer, List<TupleId> childTupleIds, List<Expr> childExprList) {
childExprList -> childResultExprs
Line 190: if (analyzer.getQueryOptions().isDisable_union_passthrough()) return false;
seems clearer to move this into init() and not execute any of the passthrough code
Line 193: // TODO: Remove this as part of IMPALA-4179.
Move TODO to the class comment
Line 194: if (isInSubplan_) return false;
same here, seems easier to move this check into init()
Line 205: // Verify that the union node has one slot for every expression.
union node -> union tuple descriptor
Line 212: if (resultExprs_.size() != childTupleDescriptor.getSlots().size()) return false;
I don't think this tricky check is correct because it won't allow passthrough for something like:
select int_col, int_col, int_col from functional.alltypes
union all
select int_col, int_col, int_col from functional.alltypes
Line 218: SlotRef unionSlot = resultExprs_.get(i).unwrapSlotRef(false);
unionSlotRef, childSlotRef
Line 220: if (!unionTupleDescriptor.getSlots().get(i).isMaterialized()) continue;
move above the unwrapSlotRef() calls
Line 221: Preconditions.checkState(unionSlot.getDesc().getParent().getId().equals(tupleId_));
Don't think we need this check, but something like this would be good:
Preconditions.checkStateNotNull(unionSlotRef);
Line 223: Preconditions.checkState(!(childSlot instanceof SlotRef));
No need for this check
Line 262: // Compute which nodes can be passed through.
which child nodes
Line 266: // Check that if the child outputs a single tuple, then it's not nullable. Tuple
move into computePassThrough
http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:
Line 329: # IMPALA-3586: The operand with the Kudu scan cannot be passed through because it's not
because id is not-nullable (primary key)
Line 346: select id from functional_kudu.alltypes
do select *
http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3104: # IMPALA-3678: Both union operands should produce rows with non-nullable slots which
remove "should"
Line 3124: # IMPALA-3678: The Union operands that contain a join should not be passed through,
nice
Line 3184: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count for consistency
http://gerrit.cloudera.org:8080/#/c/5816/7/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count for consistency
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 148: row_batch->MarkNeedsDeepCopy();
> Which memory is this? buffered-tuple-stream buffer? anything else? Will t
Yeah IMPALA-4179 covers that. IMPALA-4179 lists a few examples. One of the trickier cases right now is "Local" allocations in FunctionContexts - batches can reference that memory but there's no way to transfer it out of the FunctionContext.
I think the path of least resistance is to add a RowBatch parameter to Close() and attach such resources to the RowBatch. Ref-counting may give more flexibility but probably would require rewriting many things.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
I'll review the be side.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
Patch Set 5:
(56 comments)
Thanks for the reviews.
http://gerrit.cloudera.org:8080/#/c/5816/5//COMMIT_MSG
Commit Message:
Line 14: Testing:
> Would be nice to get some idea of the perf improvement. The JIRA has an int
I'll do a benchmark after the next patch (codegen). Or do you think it's worth doing a benchmark for both patches?
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state);
> Can you add a comment that this only applies to passthrough? E.g. "The prev
Done
Line 133: row_batch->set_num_rows(limit_ - num_rows_returned_);
> There's a corner case that breaks this calculation. The problem is that 'ro
Done. I added saving the number of rows. I am not sure that this case is possible though, because passthrough is disabled if we're in a subplan. I don't think that adding a test you suggested would be useful also because passthrough is disabled in that case.
Line 150: if (child_idx_ < children_.size() ||
> Is this just an optimisation? Might be best to remove it and keep the code
Moved this check to the top as Alex suggested. That also takes care of the case if someone calls getnext after the Union node set eos to true (which shouldnt happen).
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 102
> There was a specific reason why Open() was called here. There is an expecta
Done
Line 62: pass_through_children_ = tnode.union_node.pass_through;
> move to initializer list in cosntructor
I noticed that vectors don't get initialized in the constructor in other nodes. For example, is_asc_order_ in sort_node.h (It's done in Init there).
Do you still think it's a good idea to move it out of Init?
Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state);
> Needs comment
Done
Line 124: if (child_idx_ < children_.size() && isPassThrough(child_idx_)) {
> High-level comment what is happening here (passthrough).
Done
Line 140: if (child_eos_) {
> Add a comment why it's not ok to Close() the child in this passthrough mode
Done
Line 141: row_batch->MarkNeedsDeepCopy();
> Needs comment
Done
Line 150: if (child_idx_ < children_.size() ||
> Might as well reverse this check and move it up, set eos to true and return
Great suggestion, Done.
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 99: /// Returns true if the child can be passed through.
> Nit: "if the child at 'idx'"
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 36: /// Node that merges the results of its children by materializing their
> update comment
Done
Line 79: std::vector<bool> pass_through_children_;
> is_child_passthrough_?
Done
Line 100: inline bool isPassThrough(int idx) {
> idx -> child_idx
Done
Line 101: DCHECK(idx < pass_through_children_.size());
> DCHECK_LT
Done.
Why do we have DCHECK_LT or DCHECK_GT? Why not just use DCHECK? Is it because then it will be able to print the actual values if the dcheck fails?
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:
Line 132: if (this->type() != other_desc.type()) return false;
> can get rid of this->
Done
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 541:
> This comment needs updating to reflect the new behaviour.
I kept Equals unmodified and added Equivalent().
http://gerrit.cloudera.org:8080/#/c/5816/5/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 554: /// Comparison is done by the contents of the tuple descriptors and not the ids.
> I'd prefer to preserve the meaning of these existing functions (IsPrefixOf(
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:
Line 433: // List of booleans that indicates which children can be passed through
> Remove "List of booleans" since that's redundant
Done
Line 435: 4: required list<bool> pass_through
> is_child_passthrough (good to keep names consistent)
Done
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:
Line 228: if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) return false;
> Also need to compare the NullIndicatorBit().
Done. It actually turns out that that a non-nullable Kudu column does not get a null bit. For example in this query the tuple size is 8 bytes (no null bits) for both operands:
select kudu_idx from functional_kudu.alltypesagg_idx limit 5 union all select count(*) from functional.alltypestiny;
Added it to planner tests.
http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:
Line 222:
> Would be good to keep the name of this function and the BE equivalent the s
Done
Line 223: public boolean hasEqualPhysicalLayout(SlotDescriptor other) {
> needs brief comment
Done
Line 224: if (!this.getType().matchesType(other.getType())) return false;
> shouldn't the types be equal()?
Done
Line 226: if (this.getByteSize() != other.getByteSize()) return false;
> can remove 'this'
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
File fe/src/main/java/org/apache/impala/analysis/UnionStmt.java:
Line 155: // List of output expressions of the Union node. This should be the same as result
> List of output expressions produced by the union without the ORDER BY porti
Done
Line 156: // resultExprs_ if the UnionStmt does not have an Order By. Otherwise resultExprs_
> Let's avoid referring to specific plan nodes at this stage and instead try
Done
Line 158: private List<Expr> unionNodeResultExprs_ = Lists.newArrayList();
> unionResultExprs_
Done
Line 191: for (Expr e: other.unionNodeResultExprs_) unionNodeResultExprs_.add(e.clone());
> use Expr.cloneList()
Done
Line 501: for (UnionOperand op: operands_) {
> combine with the loop over operands_ in L526
Done
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:
Line 1536: if (ctx_.hasSubplan()) unionNode.disablePassthrough();
> Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget.
Added a TODO in a different file.
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 68: // If false, no child nodes would be passed through in the backend.
> nit: "batches from child nodes" to be a bit clearer
Done
Line 71: // Indicates which child nodes can be passed through in the backend.
> nit: "batches from child nodes" to be a bit clearer
Done
PS4, Line 176: /*
> /**
Done
Line 177: * Compute the children for which rows can be forwarded by the Union node without being
> It's a little unclear what the input is.
Done
Line 183: // Pass through is only done for the simple case where the row has a single tuple.
> What's the motivation for this? Is it because the union output is always a
Yes, I think so. Added comment.
Another motivation is that it's very rare for the tuple layout to match exactly for all operands if the number of tuples is greater than 1, (for example both sides would have to have a join with an identical layout).
Line 266: analyzer, children_.get(i).getTupleIds(), resultExprLists_.get(i)));
> I *think* in principle we may need to also check the nullable tuple IDs, si
Not exactly sure what you mean. We are guaranteed that the layout will be identical by line 214 in this file.
Also, I don't think the output tuple of Union is always non-nullable.
Line 299: if (!passThrough_.isEmpty()) {
> We probably don't want to print this at explain_level MINIMAL.
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 56: // List of output expressions of the Union node.
> List of union result exprs of the originating UnionStmt. Used for determini
Done. I think this still needs to be initialized to an empty list. Because the union node might not have any children. The other alternative is to set it to null, but it would make the code less clean. I moved the initilization to the constructor.
Line 69: protected boolean passThroughEnabled = true;
> Seems clearer to make this isInSubplan_ or something and pass that in the c
Done
Line 72: protected List<Boolean> passThrough_ = Lists.newArrayList();
> isChildPassthrough (at least something that is consistent across FE/BE and
Done
Line 76: protected UnionNode(PlanNodeId id, TupleId tupleId) {
> remove?
We actually use both constructors. This constructor is used for creating a node with no children. (only constant expressions).
Line 263: TupleDescriptor this_tuple_desc = analyzer.getDescTbl().getTupleDesc(tupleId_);
> use FE camel-case style
removed
Line 281: msg.union_node = new TUnionNode(
> add Preconditions check that asserts the correct size of passThrough
Done
Line 299: if (!passThrough_.isEmpty()) {
> Isn't this always non-empty?
It's empty in the case that this node only has const exprs. Removed the check anyways because the for loop takes care of the empty case.
Line 300: List<String> passThroughNodes = Lists.newArrayList();
> passThroughNodeIds
Done
Line 308: Joiner.on(", ").join(passThroughNodes) + "\n");
> nit: we usually don't add a spaces after the comma for explain plan stuff
Done
Line 309: }
> might be nice to produce "all" instead of listing all plan node ids if all
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
File testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test:
Line 187: | pass through nodes: 01, 02
> pass-through-operands:
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/empty.test
File testdata/workloads/functional-planner/queries/PlannerTest/empty.test:
Line 506: # IMPALA-3586: Verify that Union pass through is disabled in subplans.
> Might be good to add a separate test for this, since this one is kind of we
Done
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3085: | partitions=4/4 files=4 size=460B
> Add a new Kudu planner test (kudu.test) for:
Done
http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children
> I think this may be something to do with count(*) being non-nullable. Maybe
The bug is already fixed. During development I did not take into account the whether a slot is nullable when comparing tuples. This is fixed now.
http://gerrit.cloudera.org:8080/#/c/5816/5/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1069: # IMPALA-3586: This query caused an issue because the tuple size of the children
> No need to describe the failure mode of that specific bug you hit during de
Done. Yes, this should be passthrough.
Line 1082: # IMPALA-3586: Test the case where no nodes are passed though.
> Is this not already covered?
I think it's better to have an explicit test case for this. There are very few tests in this file that are similar to this one.
Line 1093: # IMPALA-3586: Test the case where 1 node is passed though, and one is not.
> Not already covered?
Same as above. Let me know if you disagree.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#19).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 1,461 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/19
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
Patch Set 4:
(15 comments)
I think this is looking pretty good. It turns out to hit a lot of interesting corner cases in the backend but I think we just need to make sure we've got them covered and tests for them all.
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 122: if (child_eos_ && child_idx_ > 0 && !IsInSubplan()) child(child_idx_ - 1)->Close(state);
Can you add a comment that this only applies to passthrough? E.g. "The previous child may have been left open if passthrough was enabled for it". Otherwise it's hard to figure out how it fits in with the rest of it.
Line 133: row_batch->set_num_rows(limit_ - num_rows_returned_);
There's a corner case that breaks this calculation. The problem is that 'row_batch' may be non-empty when GetNext() is called. E.g. the subplan node does this. I think we just need to save the value of num_rows() before calling GetNext() and adjust the calculation accordingly.
I think we're missing a test case where we have a union node with a limit under a subplan.
Line 150: if (child_idx_ < children_.size() ||
Is this just an optimisation? Might be best to remove it and keep the code simpler unless we have data showing it's a bottleneck. If I understand it correctly it only helps on the last GetNext() call.
If we keep it we should initialise tuple_buf to NULL so if we mess up it's more debuggable.
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 99: /// Returns true if the child can be passed through.
Nit: "if the child at 'idx'"
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:
Line 541: /// Return true if the tuple ids of this descriptor match tuple ids of other desc.
This comment needs updating to reflect the new behaviour.
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:
Line 228: if (this.getNullIndicatorByte() != other.getNullIndicatorByte()) return false;
Also need to compare the NullIndicatorBit().
There are some subtle differences with how we compute the mem layout for Kudu tables can have non-nullable slots, so I think there's an interesting test where we union the output of an aggregate function like count() (where slots are non-nullable and don't get a null bit) and a Kudu table with a non-nullable Kudu column, which gets a null bit regardless.
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:
Line 1536: if (ctx_.hasSubplan()) unionNode.disablePassthrough();
Add a TODO to remove this as part of IMPALA-4179. Otherwise I might forget.
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 68: // If false, no child nodes would be passed through in the backend.
nit: "batches from child nodes" to be a bit clearer
Line 71: // Indicates which child nodes can be passed through in the backend.
nit: "batches from child nodes" to be a bit clearer
PS4, Line 176: /*
/**
Line 177: * Compute the children for which rows can be forwarded by the Union node without being
It's a little unclear what the input is.
Maybe "Compute whether we can pass through rows from a child whether 'childExprList' is evaluated over a row with 'childTupleIds'"
Line 183: // Pass through is only done for the simple case where the row has a single tuple.
What's the motivation for this? Is it because the union output is always a row with a single tuple?
Line 266: analyzer, children_.get(i).getTupleIds(), resultExprLists_.get(i)));
I *think* in principle we may need to also check the nullable tuple IDs, since the output tuple should be non-nullable but the input could be nullable in theory. In practice I don't think it's possible for a row with 1 tuple but it would be better to be conservative.
Alex probably has more insight.
Line 299: if (!passThrough_.isEmpty()) {
We probably don't want to print this at explain_level MINIMAL.
http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3103: ====
Can you add a couple of tests where there is a table scan on one branch of the union and a hash join on the other? I didn't see much test coverage of joins in unions. The table scan should be passed through and the hash join shouldn't.
The idea is that hash join outputs a row with multiple tuples. With an inner join both are non-nullable, with an outer join, the right tuple is nullable.
I think it would be good to have both planner and end-to-end tests along those lines.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 19:
(14 comments)
nice cleanup
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 111: DCHECK_LT(child_idx_, children_.size());
this DCHECK should come before any use of child_idx_ since if it's violated, it doesn't make sense to call e.g. IsChildPassthrough() with chidl_idx_.
Line 114: DCHECK(child(child_idx_)->row_desc().LayoutEquals(row_batch->row_desc()));
what do these dchecks have to do with child_eos_? i.e. shouldn't they true regardless of the value of child_eos_?
PS19, Line 132: child_idx_++
++child_idx_ per our style.
PS19, Line 147: child_idx_ < children_.size()
Would make more sense as HasMoreMaterialized()
PS19, Line 150: children
children that need materialization
PS19, Line 153: Row
Child row batch
Line 168: // There are only 3 ways of getting out of this loop:
it would be helpful to concisely say what this loop is responsible for given it's complexity. Something like:
This loop fetches row batches from a single child and materializes each output row, until one of these conditions:
1) ...
Line 193: COUNTER_SET(rows_returned_counter_, num_rows_returned_);
should we do: child_batch_.reset() here? we shouldn't ever reference it again, but seems cleaner to keep the invariant that child_batch_ always corresponds to the open child (otherwise its memory references aren't valid).
PS19, Line 200: // We end up here iff one of the following is true (or both).
: // 1. We are done consuming all batches from the current child and we need to move on
: // to the next child.
: // 2. The output row batch is at capacity.
this could be summarized for a quicker read:
// Either we've finished the current child or the output batch is at capacity, or both.
PS19, Line 204: In other words, the only way to not end up here if we entered the outer loop is if
: // the limit is reached.
rather than say that in a comment, how about:
DCHECK(!ReachedLimit());
Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state));
shouldn't we put lines 254-263 inside a do-while loop, so that we don't return partially filled row-batches when we still have output to produce? Returning partially filled row-batches is legal but can be confusing to clients (admittedly it can happen for other reasons currently, but it'd be nice to avoid doing that unless there's a good reason).
do {
...
while (!*eos && !row_batch->AtCapacity());
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.h
File be/src/exec/union-node.h:
PS19, Line 115: child_idx
'child_idx'
Line 129: return child_idx_ >= first_materialized_child_idx_ && child_idx_ < children_.size();
this suggestion is okay to ignore, but i find these kind of conditions easier to read as:
x <= y && y < z
so that it looks more close to how you'd read it in math: x <= y < z.
That said, I don't see how the condition:
first_materialized_child_idx_ <= child_idx_ < children_.size()
is correct for whether there are more materialized children. i.e. when child_idx_ < first_materialized_child_idx, we still have more materalized children, right?
So, shouldn't this be:
// We have children that need materialization and haven't processed them all yet.
first_materizlied_child_idx_ != children_.size() && child_idx_ < children_.size()
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exprs/slot-ref.cc
File be/src/exprs/slot-ref.cc:
Line 94: DCHECK(false);
this is confusing. either it's an invariant or it's not. it's even more confusing because now it makes this case and the one at line 79 different for no good reason, and future code readers won't know why.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#20).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,460 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/20
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
No, we do not break out of the loop, we return right away (see line 149 in the original union-node.cc). The next getnext call can return 0 rows.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#15).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,343 insertions(+), 611 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/15
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#19).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 1,461 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/19
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 17:
(8 comments)
I'm pretty happy with the change. Only minor comment/naming issue left to fix.
http://gerrit.cloudera.org:8080/#/c/5816/17//COMMIT_MSG
Commit Message:
Line 19: A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
No more query option.
http://gerrit.cloudera.org:8080/#/c/5816/17/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 32: class RuntimeState;
remove
Line 70: /// evaluating its exprs.
Just or clarity it might help to spell out what its value is when all children are materialized and when no children are materialized
Line 71: int first_materialized_child_idx_;
make const and set in c'tor
Line 100: /// call on the child. Sets 'passthrough_todo_' to false when all passthrough children
update comments to not mention 'todo'
http://gerrit.cloudera.org:8080/#/c/5816/17/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 77: // Index of the first non-passthrough child; i.e. a child that needs materializing and
Shrink to: // Index of the first non-passthrough child.
Comment says it all :). The class comment already explains what passthrough is
Line 183: * Re-order the union's operands such that the passthrough operands come before the
I'd say the main purpose of this function is to compute passthrough. The re-ordering is secondary. So how about calling this computePassthrough().
Would also be good to mention that we reorder based on passthrough mostly for simplifying the BE implementation.
Line 190: isChildPassthrough.add(computePassthrough(
You could rename this computePassthrough() to isChildPassthrough()
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#8).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 729 insertions(+), 52 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/8
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 11:
Dan, I verified that for many of the existing tests (even in non-exhaustive mode), we call get next several times per child. (So each child returns several row batches) I also verified that the path gets exercised well where the previous passthrough child gets closed in the getnext call. (I verified by inserting print statements and examining the logs).
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#11).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 718 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/11
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 7:
(7 comments)
I'm pretty happy with this change
http://gerrit.cloudera.org:8080/#/c/5816/8/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 67: // Which children can be passed through, without being materialized.
duplicate 'without'
http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:
Line 225: // Indicates whether passthrough should be disabled in union nodes.
enabled
http://gerrit.cloudera.org:8080/#/c/5816/8/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:
PS8, Line 255: d
enabled
http://gerrit.cloudera.org:8080/#/c/5816/8/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 267: // nullability can be considered to be part of the physical row layout.
remove (this was moved inside computePassThrough
http://gerrit.cloudera.org:8080/#/c/5816/8/testdata/workloads/functional-query/queries/QueryTest/union.test
File testdata/workloads/functional-query/queries/QueryTest/union.test:
Line 1059: select bigint_col from functional.alltypestiny where bigint_col > 0
use unqualified table names everywhere (and fix the one above while you are here), so we get coverage over all file formats
Line 1124: ---- QUERY
Move this test into nested-types-subplan.test, otherwise this test will fail on the legacy joins/agg build since nested types are not supported there (nested-types-subplan.test will be skipped)
Line 1126: select COUNT(c.c_custkey), COUNT(v.tot_price)
lowercase count
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 21:
Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/402/
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#14).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,317 insertions(+), 607 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/14
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#10).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 719 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/10
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#12).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.h
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
34 files changed, 735 insertions(+), 62 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/12
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 6:
(6 comments)
http://gerrit.cloudera.org:8080/#/c/5816/4/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 133: DCHECK(!IsInSubplan());
> Do we have coverage of that kind of plan shape though? If someone else come
Done. Added a Union + subplan test to Planner tests and end to end tests.
http://gerrit.cloudera.org:8080/#/c/5816/6/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 103: if (!children_.empty()) RETURN_IF_ERROR(child(0)->Open(state));
> I think we need to reset 'child_batch_' here so that it has the correct 'ro
I don't think we need to reset here. It already gets reset on line 179 in this file.
Line 114: if (UNLIKELY(child_idx_ >= children_.size() &&
> My preference is to remove this code if it's not needed for correctness and
Done
Line 153: // Even though the child is at eos, it's not OK to Close() it here, like we do in
> I think this comment over-explains some resource management things that are
Done
http://gerrit.cloudera.org:8080/#/c/5816/4/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 266: Preconditions.checkState(exprList.size() == slots.size());
> nullableTupleIds_ in child PlanNode is essentially part of the row layout a
Done
http://gerrit.cloudera.org:8080/#/c/5816/4/testdata/workloads/functional-planner/queries/PlannerTest/union.test
File testdata/workloads/functional-planner/queries/PlannerTest/union.test:
Line 3103: ====
> Was this addressed?
Yes, added a union + join test with several joins in Planner and end to end tests. See PlannerTest/union.test and QueryTest/union.test
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 16:
(9 comments)
http://gerrit.cloudera.org:8080/#/c/5816/14/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 214: COUNTER_SET(rows_returned_counter_, num_rows_returned_);
> Done. So we don't need query maintenance at all?
We do need it, but only occasionally. It's already called at the beginning of GetNext(), so we don't need it here in addition.
Line 286: }
> Not sure if that makes sense. The DCHECK would only be true on the first ca
Ahh right. I was trying to add some DCHECKs to assert child_eos_ is in the expected state because that was a little tricky to understand. I suspect my comment is no longer relevant after your new changes.
http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 104: // Figure out which the sources from which we will need to fetch rows. This is being
Figure out which -> Determine
Line 261: Status UnionNode::GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos) {
This function looks so much better!
http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 91: bool const_todo_;
Minor naming thing, I don't feel too strongly but 'todo' seems unusual. Alternative:
has_more_const_
has_more_passthrough_
has_more_materialied_
Line 107: /// call on the child. Sets "passthrough_todo_" to false when all passthrough children
we typically use single quotes
Line 109: Status GetNextPassThrough(RuntimeState* state, RowBatch* row_batch);
I liked your original idea of passing a 'has_more' output parameter better. Yes, we'll always pass &passthrough_todo_ in here, but function params are easier to read and reason about than side-effects. Better to avoid side effects if we can.
http://gerrit.cloudera.org:8080/#/c/5816/16/fe/src/main/java/org/apache/impala/planner/UnionNode.java
File fe/src/main/java/org/apache/impala/planner/UnionNode.java:
Line 215: isChildPassthrough_.clear();
PlanNode.init() generally needs to be idempotent because it could be called multiple times on the same PlanNode, e.g., once during single node planning and again later during distributed planning. Pretty sure that's the case for UnionNode.
That means you could potentially end up in a weird state because the reordering does not consider the unionResultExprs_.
Line 221: // Order the children, such that all passthrough children come before the
move this reordering into a private helper function
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#17).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,465 insertions(+), 760 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/17
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:
Line 15: as a precaution and for testing purposes.
> We made this decision with Alex in person a while ago. There are 2 reasons:
I agree that "blindly" adding query options is bad, but there are so many things that could go wrong this optimizations. There is the issue of tuple-layout compatibility and of memory management since we don't materialize any more. If there's a bug or a case we are missing or even a resource regression (e.g. increased memory consumption), then we have no resource without this query option.
The alternative is to test this to death until we are 100% confident there are no bugs. Still, there are cases where we know the passthrough might lead to a higher peak memory consumption.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#21).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,461 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/21
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#2).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
19 files changed, 410 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/2
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 10: Code-Review+2
(2 comments)
Do any of the tests exercise multiple child row-batches (to exercise that child Close() on next GetNext() call logic more)? If not, please add coverage for that (maybe by setting the row batch size to something smaller).
http://gerrit.cloudera.org:8080/#/c/5816/10/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 124: DCHECK(child_idx_ >= children_.size() || !IsChildPassThrough(child_idx_));
this dcheck isn't helpful given the inverse is used as the if-condition only two lines above
PS10, Line 228: this
Remove this and clean up the child Close() logic as part of...
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#5).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
22 files changed, 467 insertions(+), 21 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/5
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#7).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/analytic-eval-node.cc
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 730 insertions(+), 53 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/7
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement Union Passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#6).
Change subject: IMPALA-3586: Implement Union Passthrough
......................................................................
IMPALA-3586: Implement Union Passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
30 files changed, 682 insertions(+), 31 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/6
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#8).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
34 files changed, 735 insertions(+), 58 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/8
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#15).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,343 insertions(+), 611 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/15
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#4).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
20 files changed, 435 insertions(+), 15 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/4
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
Patch Set 2:
(6 comments)
Need to look at fe and tests, but had some comments on the backend.
http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
PS2, Line 122: isPassThrough
I think we would either pass child_idx_ into isPassThrough() or call it something like currentChildIsPassthrough() so that it's clear it refers to the child.
Line 123: DCHECK(child(child_idx_)->row_desc().Equals(row_batch->row_desc()));
Move this into the Open() branch so we don't execute it unnecessarily.
Line 126: // We will not close any PassThrough children nodes here, they will be closed in
We need to think about the implications of this. This could increase resource consumption quite a bit in some cases. E.g. if you had a union of N hash join nodes, each of which had a limit applied, this could increase resource consumption Nx.
We need to fix the memory management model to allow early closing in cases like this, but that shouldn't block progress here.
The workaround we generally use is to set the MarkNeedsDeepCopy() flag on the last row batch. This forces all operators to either finish processing the batch (in the common case) or copy the data (in the case of nested loop join). It generally works ok except in subplans, where it can cause performance problems by forcing many tiny batches.
I think the best course would be to disable passthrough in subplans and set the MarkNeedsDeepCopy() flag on the last batch returned from each child. We do something like this in PartitionedAggregationNode::HandleOutputStrings().
I'm eventually going to get rid of MarkNeedsDeepCopy(), but using it here would flag it as one of the places we need to clean up.
http://gerrit.cloudera.org:8080/#/c/5816/2/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:
PS2, Line 438: verify
Verify (caps)
PS2, Line 442: std::
Don't need std::, we automatically import it in common/names.h.
PS2, Line 442:
Make these references with & to avoid copying the vector.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#12).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.h
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
34 files changed, 735 insertions(+), 62 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/12
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#18).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 1,462 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/18
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586 (Part 1): Implement Union Pass Through
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#5).
Change subject: IMPALA-3586 (Part 1): Implement Union Pass Through
......................................................................
IMPALA-3586 (Part 1): Implement Union Pass Through
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Testing:
Verified that existing tests cover the case where no/some/all union
children of the union node can be passed through.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
22 files changed, 467 insertions(+), 21 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/5
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 19: Code-Review+1
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#14).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
35 files changed, 1,330 insertions(+), 607 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/14
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#20).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
32 files changed, 1,460 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/20
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#10).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 719 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/10
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/9//COMMIT_MSG
Commit Message:
Line 15: as a precaution and for testing purposes.
> if we're worried about bugs, we should add more tests. i'm not in favor of
I don't think it's unusual to have "feature flags" for disabling invasive changes quickly to unblock users if they do hit a problem - these flags exist because it's infeasible to test all possible scenarios.
Let's do more testing and remove the flag.
Taras, let's talk about what additional testing we should do.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 16:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/16/be/src/exec/union-node.h
File be/src/exec/union-node.h:
Line 91: bool const_todo_;
> Minor naming thing, I don't feel too strongly but 'todo' seems unusual. Alt
Rather than having these variables, how about just saving the index of the first materialize child. Then, these can be simple functions, e.g. HasMorePassthrough(), HasMoreMaterialized(), that is derived from existing state. that way, there's less dynamic state that needs to be updated and reasoned about.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 9:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/9/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 148: row_batch->MarkNeedsDeepCopy();
> The problematic memory is memory that is never attached to any batch and is
Which memory is this? buffered-tuple-stream buffer? anything else? Will this be addressed by IMPALA-4179, or is it different? This is probably an argument for needing ref counting on buffers so a reference can be held both by the row batch and a child node, so that Close() can happen "early" without freeing the buffer.
We need to revisit this and clean up how the child closing happens here. Let's leave a todo for that referencing the appropriate jira.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has uploaded a new patch set (#6).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
- TODO: run a performance benchmark.
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
30 files changed, 679 insertions(+), 48 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/6
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#10).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 719 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/10
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
Patch Set 19:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/5816/19/be/src/exec/union-node.cc
File be/src/exec/union-node.cc:
Line 263: (!HasMorePassthrough() && !HasMoreMaterialized() && !HasMoreConst(state));
> shouldn't we put lines 254-263 inside a do-while loop, so that we don't ret
I guess we can't really make this work easily when there are multiple passthrough children, so let's not worry about doing this.
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#10).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
A new query option DISABLE_UNION_PASSTHROUGH was added in this patch
as a precaution and for testing purposes.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 20s660ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 187.474us 187.474us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 15.238us 15.238us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s958ms 3s749ms 3 1 3.08 MB 10.00 MB
00:UNION 3 211.510ms 224.667ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s637ms 1s734ms 28.80M 28.80M 528.68 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s697ms 1s708ms 28.80M 28.80M 528.48 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s681ms 1s748ms 28.80M 28.80M 529.34 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s665ms 1s756ms 28.80M 28.80M 533.81 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s675ms 1s800ms 28.80M 28.80M 530.70 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s677ms 1s759ms 28.80M 28.80M 525.95 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s621ms 1s790ms 28.80M 28.80M 534.64 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s684ms 1s743ms 28.80M 28.80M 528.55 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s528ms 1s771ms 28.80M 28.80M 533.70 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s853ms 2s149ms 28.80M 28.80M 526.53 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 720 insertions(+), 54 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/10
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
[Impala-ASF-CR] IMPALA-3586: Implement union passthrough
Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, Dan Hecht,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5816
to look at the new patch set (#19).
Change subject: IMPALA-3586: Implement union passthrough
......................................................................
IMPALA-3586: Implement union passthrough
The union node acts as pass through operator and forwards row batches
from it's children without materializing. This is done in the case
when the child's tuple layout is identical to union node tuple layout
and no functions need to be applied to the child row batches.
Removed operand reordering in the FE because it's simpler and safer to
handle all passthrough children before non-passthrough children in the
BE. The recent improvements to memory management allowed us to remove
this requirement.
Testing:
- Added new planner and end to end tests that cover the new
functionality.
- Updated existing tests to reflect the new behavior.
Perf:
Ran a benchmark on a local 10 GB tpcds dataset. I used an unpartitioned
version of the store_sales table. There was over a 2x performance
improvement for the following query:
SELECT
COUNT(ss_sold_time_sk),
COUNT(ss_item_sk),
COUNT(ss_customer_sk),
COUNT(ss_cdemo_sk),
COUNT(ss_hdemo_sk),
COUNT(ss_addr_sk),
COUNT(ss_store_sk),
COUNT(ss_promo_sk),
COUNT(ss_ticket_number),
COUNT(ss_quantity),
COUNT(ss_wholesale_cost),
COUNT(ss_list_price),
COUNT(ss_sales_price),
COUNT(ss_ext_discount_amt),
COUNT(ss_ext_sales_price),
COUNT(ss_ext_wholesale_cost),
COUNT(ss_ext_list_price),
COUNT(ss_ext_tax),
COUNT(ss_coupon_amt),
COUNT(ss_net_paid),
COUNT(ss_net_paid_inc_tax),
COUNT(ss_net_profit),
COUNT(ss_sold_date_sk)
FROM (
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
union all
select * from tpcds_10_parquet.store_sales_unpartitioned
) t
Before:
Total Time: 43s164ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 224.721us 224.721us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 24.578us 24.578us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s402ms 3s060ms 3 1 119.00 KB 10.00 MB
00:UNION 3 35s380ms 37s846ms 288.01M 288.01M 3.08 MB 0
|--02:SCAN HDFS 3 184.197ms 219.931ms 28.80M 28.80M 535.03 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 131.956ms 153.401ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 178.456ms 247.721ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 189.398ms 242.251ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 122.786ms 156.528ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 147.467ms 183.391ms 28.80M 28.80M 535.13 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 147.502ms 186.273ms 28.80M 28.80M 535.01 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 130.086ms 154.682ms 28.80M 28.80M 535.04 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 122.701ms 161.056ms 28.80M 28.80M 534.89 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 287.863ms 330.436ms 28.80M 28.80M 534.98 MB 1.88 GB store_sales_unpartitioned
After:
Total Time: 19s139ms
Summary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
------------------------------------------------------------------------------------------------------------------------------
13:AGGREGATE 1 166.241us 166.241us 1 1 28.00 KB -1.00 B FINALIZE
12:EXCHANGE 1 71.695us 71.695us 3 1 0 -1.00 B UNPARTITIONED
11:AGGREGATE 3 2s971ms 3s809ms 3 1 3.08 MB 10.00 MB
00:UNION 3 207.956ms 222.846ms 288.01M 288.01M 0 0
|--02:SCAN HDFS 3 1s533ms 1s535ms 28.80M 28.80M 532.28 MB 1.88 GB store_sales_unpartitioned
|--03:SCAN HDFS 3 1s554ms 1s669ms 28.80M 28.80M 525.73 MB 1.88 GB store_sales_unpartitioned
|--04:SCAN HDFS 3 1s568ms 1s716ms 28.80M 28.80M 525.03 MB 1.88 GB store_sales_unpartitioned
|--05:SCAN HDFS 3 1s503ms 1s617ms 28.80M 28.80M 527.43 MB 1.88 GB store_sales_unpartitioned
|--06:SCAN HDFS 3 1s560ms 1s634ms 28.80M 28.80M 528.52 MB 1.88 GB store_sales_unpartitioned
|--07:SCAN HDFS 3 1s489ms 1s643ms 28.80M 28.80M 534.81 MB 1.88 GB store_sales_unpartitioned
|--08:SCAN HDFS 3 1s534ms 1s581ms 28.80M 28.80M 528.10 MB 1.88 GB store_sales_unpartitioned
|--09:SCAN HDFS 3 1s558ms 1s674ms 28.80M 28.80M 526.77 MB 1.88 GB store_sales_unpartitioned
|--10:SCAN HDFS 3 1s504ms 1s692ms 28.80M 28.80M 527.83 MB 1.88 GB store_sales_unpartitioned
01:SCAN HDFS 3 1s682ms 1s911ms 28.80M 28.80M 526.14 MB 1.88 GB store_sales_unpartitioned
Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
---
M be/src/exec/exchange-node.cc
M be/src/exec/union-node.cc
M be/src/exec/union-node.h
M be/src/exprs/slot-ref.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch.cc
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/order.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/small-query-opt.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/topn.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M testdata/workloads/functional-planner/queries/PlannerTest/with-clause.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M testdata/workloads/functional-query/queries/QueryTest/union.test
M tests/query_test/test_queries.py
33 files changed, 1,462 insertions(+), 764 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/5816/19
--
To view, visit http://gerrit.cloudera.org:8080/5816
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8f6d5062724ba5b78174c3227a7a796d10d8416
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>