You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2020/11/11 12:46:06 UTC
[Impala-ASF-CR] IMPALA-10317: Add join rows produced limit per query
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16706
Change subject: IMPALA-10317: Add join rows produced limit per query
......................................................................
IMPALA-10317: Add join rows produced limit per query
This patch limits the number of join rows produced by a query by
tracking join profile. When the NUM_JOIN_ROWS_PRODUCED_LIMIT
is set, it cancels a query when its execution produces more rows
than the specified limit.
Testing:
Added tests for NUM_JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
13 files changed, 120 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/2
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 7:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6799/ DRY_RUN=false
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 00:31:25 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 6: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 00:27:59 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 7: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 06:10:38 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 3:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7747/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Nov 2020 01:43:14 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by Join node.
The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.
JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.
Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":
NESTED_LOOP_JOIN_NODE (id=2):
- InactiveTotalTime: 107.534ms
- PeakMemoryUsage: 16.00 KB (16384)
- ProbeRows: 1.02K (1024)
- ProbeTime: 0.000ns
- RowsReturned: 10.00M (10002025)
- RowsReturnedRate: 749.58 K/sec
- TotalTime: 13s337ms
Testing:
Added tests for JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 167 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/5
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by Join node.
The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.
JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.
Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":
NESTED_LOOP_JOIN_NODE (id=2):
- InactiveTotalTime: 107.534ms
- PeakMemoryUsage: 16.00 KB (16384)
- ProbeRows: 1.02K (1024)
- ProbeTime: 0.000ns
- RowsReturned: 10.00M (10002025)
- RowsReturnedRate: 749.58 K/sec
- TotalTime: 13s337ms
Testing:
Added tests for JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 166 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/4
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 6:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG@9
PS5, Line 9: a join no
> You may want to mention why it is useful even though it may be somewhat obv
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230: /// Total num rows produced by each join node. The key is join node id.
> Good !
Done
http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:
http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto@261
PS5, Line 261: map<int32, int64> per_join_rows_produced = 16;
> Mark this as 'optional' field ?
Maps cannot be repeated, optional, or required.
from: https://developers.google.com/protocol-buffers/docs/proto#maps
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Dec 2020 14:44:58 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.
The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.
JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.
Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":
NESTED_LOOP_JOIN_NODE (id=2):
- InactiveTotalTime: 107.534ms
- PeakMemoryUsage: 16.00 KB (16384)
- ProbeRows: 1.02K (1024)
- ProbeTime: 0.000ns
- RowsReturned: 10.00M (10002025)
- RowsReturnedRate: 749.58 K/sec
- TotalTime: 13s337ms
Testing:
Added tests for JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Reviewed-on: http://gerrit.cloudera.org:8080/16706
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 167 insertions(+), 5 deletions(-)
Approvals:
Impala Public Jenkins: Looks good to me, approved; Verified
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 7: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 00:31:24 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 4:
(12 comments)
Thanks for the review! Addressed the comments.
http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11
PS3, Line 11: exceed
> nit: 'exceeding' instead of exceed
Done
http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18
PS3, Line 18: backend metrics for RowsReturned. Example from "
> For this example, can you also add the query option ? I assume it was set t
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230: /// Total num rows produced by each join node. The key is join node id.
> Pls indicate what the key of the map is. I am also wondering whether the st
The key has been changed to node id
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231
PS3, Line 231: join_rows_produced;
> This could more simply be named 'per_join_rows_produced' (see similar comm
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331
PS3, Line 331: // row count stats for a join node
> nit: this comment is a bit confusing. Suggest rephrasing to 'row count sta
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335
PS3, Line 335: rfind(nested_loop_type
> A bit odd that for hash_type you are passing a second argument of 0 while n
To implement the startsWith function, the two variable judgments are the same.
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336
PS3, Line 336: per_join_rows_produced[node->metadata().plan_node_id] = rows_counter->value();
> Is there not a unique numeric id that can be used here instead of the node
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688
PS3, Line 2688: Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED,
> it would be useful to know which join node (with the node id) in a complex
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991
PS3, Line 991: case TImpalaQueryOptions::USE_DOP_FOR_COSTING: {
> I think the prefix 'NUM' can be removed and simplify the naming. JOIN_ROWS
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996
PS3, Line 996: StringParser::ParseResult result;
> 'rows returned limit' doesn't match the name of the query option. Should th
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test:
http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71
PS3, Line 71: ---- QUERY
> Suggest adding a test with a mix of joins ..i.e couple of hash joins and a
Done
http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87
PS3, Line 87: in
> nit: remove trailing whitespace
Done
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:12:02 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 4:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7881/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:33:24 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 5:
(3 comments)
Looking good. Just couple of minor comments.
http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG@9
PS5, Line 9: Join node
You may want to mention why it is useful even though it may be somewhat obvious. Something like '..produced by a join node such that runaway join queries can be prevented.'
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230: /// Total num rows produced by each join node. The key is join node id.
> The key has been changed to node id
Good !
http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:
http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto@261
PS5, Line 261: map<int32, int64> per_join_rows_produced = 16;
Mark this as 'optional' field ?
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Dec 2020 07:53:47 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 5:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:
http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py@468
PS4, Line 468:
> flake8: E501 line too long (95 > 90 characters)
Done
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:26:31 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 4:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:
http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py@468
PS4, Line 468: $
flake8: E501 line too long (95 > 90 characters)
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:12:03 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.
The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.
JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.
Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":
NESTED_LOOP_JOIN_NODE (id=2):
- InactiveTotalTime: 107.534ms
- PeakMemoryUsage: 16.00 KB (16384)
- ProbeRows: 1.02K (1024)
- ProbeTime: 0.000ns
- RowsReturned: 10.00M (10002025)
- RowsReturnedRate: 749.58 K/sec
- TotalTime: 13s337ms
Testing:
Added tests for JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 167 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/6
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
IMPALA-10317: Add query option that limits huge joins at runtime
This patch adds support for limiting the rows produced by Join node.
The limit is specified by a query option. Queries exceed that limit get
terminated. The checking runs periodically, so the actual rows produced
may go somewhat over the limit.
NUM_JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.
Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "select count(*) from
tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem
l2 limit 5) l3;":
NESTED_LOOP_JOIN_NODE (id=2):
- InactiveTotalTime: 107.534ms
- PeakMemoryUsage: 16.00 KB (16384)
- ProbeRows: 1.02K (1024)
- ProbeTime: 0.000ns
- RowsReturned: 10.00M (10002025)
- RowsReturnedRate: 749.58 K/sec
- TotalTime: 13s337ms
Testing:
Added tests for NUM_JOIN_ROWS_PRODUCED_LIMIT
Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
13 files changed, 125 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/3
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10317: Add join rows produced limit per query
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add join rows produced limit per query
......................................................................
Patch Set 2:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7633/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Nov 2020 13:06:40 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add join rows produced limit per query
Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add join rows produced limit per query
......................................................................
Patch Set 2:
Thanks for your contribution. I left a comment in the JIRA to flush out the motivation some more before looking at the code.
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 02:03:59 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 3:
(12 comments)
http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11
PS3, Line 11: exceed
nit: 'exceeding' instead of exceed
http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18
PS3, Line 18: backend metrics for RowsReturned. Example from "select count(*) from
For this example, can you also add the query option ? I assume it was set to 10M ?
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230: /// Total num rows produced by each join node
Pls indicate what the key of the map is. I am also wondering whether the string (which is the node name) is the only way to identify the node. Is there no numeric id that you can use ?
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231
PS3, Line 231: per_num_join_rows_produced
This could more simply be named 'per_join_rows_produced' (see similar comment elsewhere on the option name)
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331
PS3, Line 331: // stats join node
nit: this comment is a bit confusing. Suggest rephrasing to 'row count stats for a join node'
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335
PS3, Line 335: rfind(nested_loop_type
A bit odd that for hash_type you are passing a second argument of 0 while not doing the same for nested_loop_type ? If there's a reason for that it would be good to add a comment.
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336
PS3, Line 336: per_num_join_rows_produced[node->name()] = rows_counter->value();
Is there not a unique numeric id that can be used here instead of the node name ? (related comment in coordinator.h)
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688
PS3, Line 2688: Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED,
it would be useful to know which join node (with the node id) in a complex plan exceeded this limit. Is it possible to print that info in the error message ?
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991
PS3, Line 991: case TImpalaQueryOptions::NUM_JOIN_ROWS_PRODUCED_LIMIT: {
I think the prefix 'NUM' can be removed and simplify the naming. JOIN_ROWS_PRODUCED_LIMIT gives sufficient information that it is about the number of rows. Similar simplification of the variable names can be done in other parts of the code.
http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996
PS3, Line 996: return Status(Substitute("Invalid rows returned limit: '$0'. "
'rows returned limit' doesn't match the name of the query option. Should this be 'Invalid join rows produced limit'?
http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test:
http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71
PS3, Line 71: ---- QUERY
Suggest adding a test with a mix of joins ..i.e couple of hash joins and a nested loop join where one or more of these might exceed the threshold.
http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87
PS3, Line 87: in
nit: remove trailing whitespace
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 08:11:26 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 5:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7882/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:49:55 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................
Patch Set 6:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7885/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Dec 2020 15:06:06 +0000
Gerrit-HasComments: No