You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2020/11/11 12:46:06 UTC

[Impala-ASF-CR] IMPALA-10317: Add join rows produced limit per query

Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16706


Change subject: IMPALA-10317: Add join rows produced limit per query
......................................................................

IMPALA-10317: Add join rows produced limit per query

This patch limits the number of join rows produced by a query by
tracking join profile. When the NUM_JOIN_ROWS_PRODUCED_LIMIT
is set, it cancels a query when its execution produces more rows
than the specified limit.

Testing:
Added tests for NUM_JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
13 files changed, 120 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6799/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 00:31:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 00:27:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 06:10:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7747/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Nov 2020 01:43:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................

IMPALA-10317: Add query option that limits huge joins at runtime

This patch adds support for limiting the rows produced by Join node.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 167 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................

IMPALA-10317: Add query option that limits huge joins at runtime

This patch adds support for limiting the rows produced by Join node.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 166 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG@9
PS5, Line 9:  a join no
> You may want to mention why it is useful even though it may be somewhat obv
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230:     /// Total num rows produced by each join node. The key is join node id.
> Good !
Done


http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto@261
PS5, Line 261:   map<int32, int64> per_join_rows_produced = 16;
> Mark this as 'optional' field ?
Maps cannot be repeated, optional, or required.
from: https://developers.google.com/protocol-buffers/docs/proto#maps



-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Dec 2020 14:44:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................

IMPALA-10317: Add query option that limits huge joins at runtime

This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Reviewed-on: http://gerrit.cloudera.org:8080/16706
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 167 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 8
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Dec 2020 00:31:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 4:

(12 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11
PS3, Line 11: exceed
> nit: 'exceeding' instead of exceed
Done


http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18
PS3, Line 18: backend metrics for RowsReturned. Example from "
> For this example, can you also add the query option ? I assume it was set t
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230:     /// Total num rows produced by each join node. The key is join node id.
> Pls indicate what the key of the map is. I am also wondering whether the st
The key has been changed to node id


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231
PS3, Line 231: join_rows_produced;
> This could more simply be named 'per_join_rows_produced'  (see similar comm
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331
PS3, Line 331:         // row count stats for a join node
> nit: this comment is a bit confusing.  Suggest rephrasing to 'row count sta
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335
PS3, Line 335: rfind(nested_loop_type
> A bit odd that for hash_type you are passing a second argument of 0 while n
To implement the startsWith function, the two variable judgments are the same.
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336
PS3, Line 336:           per_join_rows_produced[node->metadata().plan_node_id] = rows_counter->value();
> Is there not a unique numeric id that can be used here instead of the node 
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688
PS3, Line 2688:     Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED,
> it would be useful to know which join node (with the node id) in a complex 
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991
PS3, Line 991:       case TImpalaQueryOptions::USE_DOP_FOR_COSTING: {
> I think the prefix 'NUM' can be removed and simplify the naming.  JOIN_ROWS
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996
PS3, Line 996:         StringParser::ParseResult result;
> 'rows returned limit' doesn't match the name of the query option. Should th
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test:

http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71
PS3, Line 71: ---- QUERY
> Suggest adding a test with a mix of joins ..i.e couple of hash joins and a 
Done


http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87
PS3, Line 87: in
> nit: remove trailing whitespace
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:12:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7881/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:33:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 5:

(3 comments)

Looking good. Just couple of minor comments.

http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16706/5//COMMIT_MSG@9
PS5, Line 9:  Join node
You may want to mention why it is useful even though it may be somewhat obvious.  Something like '..produced by a join node such that runaway join queries can be prevented.'


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230:     /// Total num rows produced by each join node. The key is join node id.
> The key has been changed to node id
Good !


http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/16706/5/common/protobuf/control_service.proto@261
PS5, Line 261:   map<int32, int64> per_join_rows_produced = 16;
Mark this as 'optional' field ?



-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Dec 2020 07:53:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py@468
PS4, Line 468: 
> flake8: E501 line too long (95 > 90 characters)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:26:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

http://gerrit.cloudera.org:8080/#/c/16706/4/common/thrift/generate_error_codes.py@468
PS4, Line 468: $
flake8: E501 line too long (95 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:12:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................

IMPALA-10317: Add query option that limits huge joins at runtime

This patch adds support for limiting the rows produced by a join node
such that runaway join queries can be prevented.

The limit is specified by a query option. Queries exceeding that limit
get terminated. The checking runs periodically, so the actual rows
produced may go somewhat over the limit.

JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "
set JOIN_ROWS_PRODUCED_LIMIT = 10000000;
select count(*) from tpch_parquet.lineitem l1 cross join
(select * from tpch_parquet.lineitem l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
14 files changed, 167 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................

IMPALA-10317: Add query option that limits huge joins at runtime

This patch adds support for limiting the rows produced by Join node.

The limit is specified by a query option. Queries exceed that limit get
terminated. The checking runs periodically, so the actual rows produced
may go somewhat over the limit.

NUM_JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option.

Rows produced Query profile is updated to include query wide and per
backend metrics for RowsReturned. Example from "select count(*) from
tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem
l2 limit 5) l3;":

NESTED_LOOP_JOIN_NODE (id=2):
   - InactiveTotalTime: 107.534ms
   - PeakMemoryUsage: 16.00 KB (16384)
   - ProbeRows: 1.02K (1024)
   - ProbeTime: 0.000ns
   - RowsReturned: 10.00M (10002025)
   - RowsReturnedRate: 749.58 K/sec
   - TotalTime: 13s337ms

Testing:
 Added tests for NUM_JOIN_ROWS_PRODUCED_LIMIT

Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/service/impala-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/generate_error_codes.py
M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
13 files changed, 125 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10317: Add join rows produced limit per query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add join rows produced limit per query
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7633/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Nov 2020 13:06:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add join rows produced limit per query

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add join rows produced limit per query
......................................................................


Patch Set 2:

Thanks for your contribution. I left a comment in the JIRA to flush out the motivation some more before looking at the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 02:03:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11
PS3, Line 11: exceed
nit: 'exceeding' instead of exceed


http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18
PS3, Line 18: backend metrics for RowsReturned. Example from "select count(*) from
For this example, can you also add the query option ? I assume it was set to 10M ?


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230:     /// Total num rows produced by each join node
Pls indicate what the key of the map is. I am also wondering whether the string (which is the node name) is the only way to identify the node. Is there no numeric id that you can use ?


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231
PS3, Line 231: per_num_join_rows_produced
This could more simply be named 'per_join_rows_produced'  (see similar comment elsewhere  on the option name)


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331
PS3, Line 331:         // stats join node
nit: this comment is a bit confusing.  Suggest rephrasing to 'row count stats for a join node'


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335
PS3, Line 335: rfind(nested_loop_type
A bit odd that for hash_type you are passing a second argument of 0 while not doing the same for nested_loop_type ? If there's a reason for that it would be good to add a comment.


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336
PS3, Line 336:           per_num_join_rows_produced[node->name()] = rows_counter->value();
Is there not a unique numeric id that can be used here instead of the node name ? (related comment in coordinator.h)


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688
PS3, Line 2688:     Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED,
it would be useful to know which join node (with the node id) in a complex plan exceeded this limit. Is it possible to print that info in the error message ?


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991
PS3, Line 991:       case TImpalaQueryOptions::NUM_JOIN_ROWS_PRODUCED_LIMIT: {
I think the prefix 'NUM' can be removed and simplify the naming.  JOIN_ROWS_PRODUCED_LIMIT gives sufficient information that it is about the number of rows.  Similar simplification of the variable names can be done in other parts of the code.


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996
PS3, Line 996:           return Status(Substitute("Invalid rows returned limit: '$0'. "
'rows returned limit' doesn't match the name of the query option. Should this be 'Invalid join rows produced limit'?


http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test:

http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71
PS3, Line 71: ---- QUERY
Suggest adding a test with a mix of joins ..i.e couple of hash joins and a nested loop join where one or more of these might exceed the threshold.


http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87
PS3, Line 87: in 
nit: remove trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 08:11:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7882/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 20 Dec 2020 04:49:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7885/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Dec 2020 15:06:06 +0000
Gerrit-HasComments: No