You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2018/10/08 17:45:33 UTC

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11615


Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
conurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance independently tries to
update the coordinator backend states. In addition, due to the lack
of coordinator between query fragment instances, a query may end
without collecting the profiles from all fragment instances when
one of them hits an error before some other fragment instance manages
to finish calling Prepare(), leading to missing profile for certain
fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread). With this change, the status reporting thread is
also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
16 files changed, 377 insertions(+), 420 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@185
PS2, Line 185: status report periodically
             :   /// the
nit: status reports periodically to the


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@428
PS2, Line 428: A state transition happens
             :   /// if the current state is a non-terminal state
A little confusing - its required to currently be in a non-terminal state when this is called.


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@227
PS2, Line 227: BackendExecState old_state = backend_exec_state_;
any reason to do this here, when its only used within the lock below?


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = old_state == BackendExecState::PREPARING ?
             :           BackendExecState::EXECUTING : BackendExecState::FINISHED;
This feels a little bit weird - would it work to make this more explicit by having UpdateBackendExecState take a param of the state to transition to, and then DCHECK-ing that the param is the next state in the lifecycle?


http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto@150
PS2, Line 150: // The fragment instance id of the first failed fragment instance.
Maybe worth making it more explicit that the reason we chose the 'first' here is that it corresponds to the value of 'overall_status'

Also, what's the expected value here if 'overall_status' is for a non-fragment-specific "general" error?



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 21:04:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1017/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 22:32:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(11 comments)

Had a few comments. Overall direction of the patch makes sense to me

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc@277
PS1, Line 277:   lock_guard<SpinLock> l1(exec_summary->lock);
Isn't this just reverting the change that the previous patch made?


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@93
PS1, Line 93:   /// Called periodically to get the current status of this fragment instance.
Since we're describing the usage pattern, might as well mention which thread is calling it.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@159
PS1, Line 159:   bool final_report_sent_ = false;
What's the thread safety story here?


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@97
PS1, Line 97: intance
instance


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@239
PS1, Line 239: 
nit: unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@243
PS1, Line 243: void FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* instance_status,
nit: could probably reduce vertical whitespace in this function.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc@128
PS1, Line 128: void QueryExecMgr::StartQueryHelper(QueryState* qs) {
I feel like we should rename this function since it now runs until the query finishes on this backend, e.g. ExecuteQueryHelper()


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@91
PS1, Line 91: /// the final report (it will not be overridden by the resulting cancellation).
Is there anything short we can say here to clarify the relationship with the overall state on the coordinator?


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@418
PS1, Line 418:   bool IsStatusSet() const {
Maybe something like "HasErrorStatus()". It's confusing that setting overall_status_ to cancelled counts as "not set".


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@419
PS1, Line 419:     return !overall_status_.ok() && !overall_status_.IsCancelled();
This is called without holding status_lock_, right? I don't think IsCancelled() is thread-safe? Since we could could overwrite msg_ with an error status from a different thread and free the previous msg_ object.

I don't really like this pattern of doing unlocked reads to Status objects given that the interface of status is not documented as thread-safe - I get that .ok() works since Status is implemented as a pointer and we're on x86 but I think we need to clarify the interfaces - feels like it wouldn't take much for someone to invalidate one of the assumptions made by accident.

I wonder if the optimisation of deferring acquisition of the lock really matters. If we think it's necessary, it seems like we should document which Status methods are threadsafe, and change the implementation to use atomics so that the memory ordering assumptions are explicit.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236:     if (overall_status_.IsCancelled()) {
Can we refine this with the internal/client-driven cancellation distinction that was added here https://gerrit.cloudera.org/#/c/11464/



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Oct 2018 22:56:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1007/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 08:54:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
File testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test@26
PS8, Line 26: #A fragment instance's last status report may be sent before calling Close() which is
> This in fact may be racy as the final profile may be sent before Close() is
So, workaround this flakiness in test by changing this to row_regex:.* for now.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 18:57:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Nov 2018 01:01:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#2).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance tries to them at the same
time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish calling Prepare(), leading
to missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
17 files changed, 375 insertions(+), 426 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = cur_state == BackendExecState::PREPARING ?
             :           BackendExecState::EXECUTING : BackendExecState::FINISHED;
> The state transition logic is determined by 'overall_status_'.  So, making 
Sorry, to be clearer - I was suggesting leaving the logic around 'overall_state_' here and just changing the contents of this 'else' branch.

My thinking was that for a particular call site of UpdateBackendExecState() we would know what state we expect to transition to if overall_state_ is OK, eg. the call on line 518 would just take EXECUTING as a param, which would help ensure we don't accidentally call UpdateBackendExecState() twice for the PREPARE->EXECUTING transition. Looking more closely, though, I see that this doesn't make sense for the call at line 510.

I think its fine as is, no need to significantly rewrite this. Maybe just add a DCHECK at line 520 that the state is EXECUTING.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 22:51:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1150/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:53:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/1//COMMIT_MSG@12
PS1, Line 12: conurrent
concurrent


http://gerrit.cloudera.org:8080/#/c/11615/1//COMMIT_MSG@15
PS1, Line 15: coordinator
coordination



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Oct 2018 22:24:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3426/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 21:09:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 21:34:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1278/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 19:39:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#6).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_udfs.py
21 files changed, 419 insertions(+), 473 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc@495
PS5, Line 495: error:
             :   // This point is reached if there were general errors to start query fragment instances.
             :   // Wait for all running fragment instances to finish preparing and report status to the
             :   // coordinator to start query cancellation.
> Since we are changing some error handling for thread creation, a way to tes
Thanks for pointing that out. We already have a test to exercise the thread creation failure here (https://github.com/apache/impala/blob/master/tests/failure/test_failpoints.py#L159-L173) but I think it's also a good idea to exercise the paths stress option you mentioned. Thanks for the suggestion.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 00:31:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto@113
PS2, Line 113: int32
int64. Will update in PS3.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 18:21:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 6: Code-Review+1

Carry +1


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:18:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@185
PS2, Line 185: status reports periodically
             :   /// to 
> nit: status reports periodically to the
Done


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.h@428
PS2, Line 428: Should only be called when
             :   /// the current state is a non-terminal state. T
> A little confusing - its required to currently be in a non-terminal state w
Done


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@227
PS2, Line 227: {
> any reason to do this here, when its only used within the lock below?
Done


http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = cur_state == BackendExecState::PREPARING ?
             :           BackendExecState::EXECUTING : BackendExecState::FINISHED;
> This feels a little bit weird - would it work to make this more explicit by
The state transition logic is determined by 'overall_status_'.  So, making UpdateBackendExecState() take a param of the next state means moving the logic encoded in the if-stmt above elsewhere and UpdateBackendExecState() simply becomes a setter of backend_exec_state_ with some DCHECKs(). I can see how it may be slightly clearer for the transition from PREPARING state to the next legal state but it may make the code slightly more hairy at the multiple call sites.

Will it be clearer if we update the DCHECK at line 231 to make it more explicit which valid states we could be at so it's easier to reason about the state transition logic ?


http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/11615/2/common/protobuf/control_service.proto@150
PS2, Line 150: // The fragment instance id of the first failed fragment instance.
> Maybe worth making it more explicit that the reason we chose the 'first' he
It's not set for "general" error. Comments clarified to update its relationship with "overall_status".



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 21:57:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: coordinator backend states. In addition, due to the lack
            : of coordinator between query fragment instances, a query may end
            : without collecting the profiles from all fragment instances when
            : one of them hits an error before some other fragment instance manages
            : to finish calling Prepare(), leading to missing 
> Does this mean that we've fixed the problem in IMPALA-7148 and can re-enabl
Yes. Uncommented some of the tests in bloom_filters.test.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236:     if (overall_status_.IsCancelled()) {
> Right, I was thinking that CANCELLED_INTERNALLY shouldn't leak out of fragm
Keep it as-is for now.


http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
File testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test:

http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test@18
PS5, Line 18: ---- QUERY
> This test doesn't really have anything to do with nondeterminism. Would you
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 02:13:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/2/be/src/runtime/query-state.cc@243
PS2, Line 243: backend_exec_state_ = cur_state == BackendExecState::PREPARING ?
             :           BackendExecState::EXECUTING : BackendExecState::FINISHED;
> Sorry, to be clearer - I was suggesting leaving the logic around 'overall_s
There is one already at line 523.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Oct 2018 02:21:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11615/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/3//COMMIT_MSG@13
PS3, Line 13: backend states as each fragment instance tries to them at the same
nit:send


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc@444
PS3, Line 444:   if (fragment_ctx_.fragment.output_sink.type != TDataSinkType::PLAN_ROOT_SINK) {
             :     // if we haven't already release this thread token in Prepare(), release it now
             :     ReleaseThreadToken();
             :   }
this is called at only one site and now does only one thing, should we just get rid of this method ?


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h@78
PS3, Line 78: some instances hit
            :   /// some errors
nit: unless an instance hits an error they are cancelled


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@397
PS3, Line 397: void QueryState::ErrorDuringExecute(const Status& status, const TUniqueId& finst_id) {
previous patch had a racy check on the status, was there any benefit that earlier?
If this also called by every fragment when they are cancelled, then it might be worth retaining the racy call


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@509
PS3, Line 509:   }
nit: retain this comment here from previously at L506: 
 // Block until all the already started fragment instances finish Prepare()-ing to
    // to transition to the next state and finally report an error.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Oct 2018 03:22:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

I won't get a chance to review this for the next two weeks but please don't let that hold things up.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Oct 2018 20:53:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236:     if (overall_status_.IsCancelled()) {
> Sorry, I don't quite get this comment. In either cases, won't the state sti
Right, I was thinking that CANCELLED_INTERNALLY shouldn't leak out of fragment instances so should really result in an ERROR status rather than CANCELLED. Ok not to change this now since it's preserving the current behaviour.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Oct 2018 18:32:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@353
PS1, Line 353:   /// 'overall_status_' will be set once this is unblocked and so will 'failed_instance_id_'
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Oct 2018 17:46:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#4).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance tries to them at the same
time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish calling Prepare(), leading
to missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
18 files changed, 387 insertions(+), 436 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/981/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Oct 2018 18:21:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@93
PS1, Line 93:   /// Called periodically to get the current status of this fragment instance.
> Since we're describing the usage pattern, might as well mention which threa
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.h@159
PS1, Line 159:   bool final_report_sent_ = false;
> What's the thread safety story here?
Added some comments. This is exclusively accessed by the query state thread only.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@97
PS1, Line 97: intance
> instance
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@239
PS1, Line 239: 
> nit: unnecessary blank line
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/fragment-instance-state.cc@243
PS1, Line 243: void FragmentInstanceState::GetStatusReport(FragmentInstanceExecStatusPB* instance_status,
> nit: could probably reduce vertical whitespace in this function.
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-exec-mgr.cc@128
PS1, Line 128: void QueryExecMgr::StartQueryHelper(QueryState* qs) {
> I feel like we should rename this function since it now runs until the quer
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@353
PS1, Line 353:   /// 'overall_status_' will be set once this is unblocked and so will 'failed_instance_id_'
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@418
PS1, Line 418:   bool IsStatusSet() const {
> Maybe something like "HasErrorStatus()". It's confusing that setting overal
Done


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.h@419
PS1, Line 419:     return !overall_status_.ok() && !overall_status_.IsCancelled();
> This is called without holding status_lock_, right? I don't think IsCancell
Good point. I think the optimization doesn't really matter that much given it's only exercised in the error path. It's okay to have a minor bit of lock contention if multiple fragment instances hit errors at the same time.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@a553
PS1, Line 553: 
             : 
The deletion of this line warrants some explanation:

In the old code, if there is any error returned from fis->Exec(), the final report for "fis" would have been sent already so the coordinator will already record the failure of "fis" before the cancellation is issued.

In the new code, the reporting is done periodically by the query state thread so the final report may or may not be sent at this point after "fis" hit an error above. Calling Cancel() here may actually cause the coordinator fragment (if there is one) to prematurely set the backend status at the coordinator, causing Coordinator::BackendState::ApplyExecStatusReport() to ignore subsequent updates from "fis".  As a result, the actual error status was masked.

The implicit contract in the code (even before this change) is that the final report of the first fragment instance to hit an (non-cancellation) error must have been sent before initiating the cancellation. Will add a comment here.


http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/query-state.cc@236
PS1, Line 236:     if (overall_status_.IsCancelled()) {
> Can we refine this with the internal/client-driven cancellation distinction
Sorry, I don't quite get this comment. In either cases, won't the state still transit to BackendExecState::CANCELLED ?



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Oct 2018 08:11:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 19:43:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#3).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states as each fragment instance tries to them at the same
time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish calling Prepare(), leading
to missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
17 files changed, 382 insertions(+), 426 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc@444
PS3, Line 444:   if (fragment_ctx_.fragment.output_sink.type != TDataSinkType::PLAN_ROOT_SINK) {
             :     // if we haven't already release this thread token in Prepare(), release it now
             :     ReleaseThreadToken();
             :   }
> this is called at only one site and now does only one thing, should we just
This still seems like a conceptual step in the fragment life cycle so it makes sense to encapsulate it as a function.


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-exec-mgr.h@78
PS3, Line 78: an instances hit
            :   /// an error or
> nit: unless an instance hits an error they are cancelled
Done


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@397
PS3, Line 397: void QueryState::ErrorDuringExecute(const Status& status, const TUniqueId& finst_id) {
> previous patch had a racy check on the status, was there any benefit that e
It's okay to skip the racy call for now until evidence suggests otherwise. It's not a big deal if multiple fragment threads have to block when setting the error status. We don't expect this to be a common case so it's better to keep the code simpler and less error prone.


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@509
PS3, Line 509:   }
> nit: retain this comment here from previously at L506: 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 00:47:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: In addition, due to the lack of coordination between query
            : fragment instances, a query may end without collecting the profiles
            : from all fragment instances when one of them hits an error before
            : another fragment instance manages to finish Prepare(), leading to
            : missing profiles for certain fragment instances.
> Yes. Uncommented some of the tests in bloom_filters.test.
Great. Could you also remove the xfail from test_profile_fragment_instances


http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py@290
PS6, Line 290:     # Some tests assume determinism or non-determinism, which depends on expr rewrites.
This comment is outdated, and below



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 22:50:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#9).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_observability.py
M tests/query_test/test_udfs.py
22 files changed, 423 insertions(+), 475 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/9
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#5).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
18 files changed, 389 insertions(+), 436 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Reviewed-on: http://gerrit.cloudera.org:8080/11615
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_observability.py
M tests/query_test/test_udfs.py
22 files changed, 423 insertions(+), 475 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
File testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/11615/8/testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test@26
PS8, Line 26: UDF WARNING: Memory leaked via FunctionContext::Allocate(), 100 bytes leaked via FunctionContext::TrackAl
This in fact may be racy as the final profile may be sent before Close() is called on a Fragment Instance. So, this is not 100% deterministic.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 00:49:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: In addition, due to the lack of coordination between query
            : fragment instances, a query may end without collecting the profiles
            : from all fragment instances when one of them hits an error before
            : another fragment instance manages to finish Prepare(), leading to
            : missing profiles for certain fragment instances.
Does this mean that we've fixed the problem in IMPALA-7148 and can re-enable those tests?

In particular, are we now guaranteed that in a query that completes successfully but has some fragments cancelled internally (eg. because of a LIMIT) the coordinator will receive and apply at least one profile from every fragment instance?


http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
File testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test:

http://gerrit.cloudera.org:8080/#/c/11615/5/testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test@18
PS5, Line 18: ---- QUERY
This test doesn't really have anything to do with nondeterminism. Would you mind renaming this file to something like udf-no-expr-rewrites?



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Oct 2018 22:56:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3405/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 21:15:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Thomas Marshall, Tim Armstrong, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11615

to look at the new patch set (#7).

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................

IMPALA-4063: Merge report of query fragment instances per executor

Previously, each fragment instance executing on an executor will
independently report its status to the coordinator periodically.
This creates a huge amount of RPCs to the coordinator under highly
concurrent workloads, causing lock contention in the coordinator's
backend states when multiple fragment instances send them at the
same time. In addition, due to the lack of coordination between query
fragment instances, a query may end without collecting the profiles
from all fragment instances when one of them hits an error before
another fragment instance manages to finish Prepare(), leading to
missing profiles for certain fragment instances.

This change fixes the problem above by making a thread per QueryState
(started by QueryExecMgr) to be responsible for periodically reporting
the status and profiles of all fragment instances of a query running
on a backend. As part of this refactoring, each query fragment instance
will not report their errors individually. Instead, there is a cumulative
status maintained per QueryState. It's set to the error status of the first
fragment instance which hits an error or any general error (e.g. failure
to start a thread) when starting fragment instances. With this change,
the status reporting threads are also removed.

Testing done: exhaustive tests

This patch is based on a patch by Sailesh Mukil

Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M common/protobuf/control_service.proto
M common/thrift/RuntimeProfile.thrift
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
A testdata/workloads/functional-query/queries/QueryTest/udf-no-expr-rewrite.test
D testdata/workloads/functional-query/queries/QueryTest/udf-non-deterministic.test
M testdata/workloads/functional-query/queries/QueryTest/udf.test
M tests/query_test/test_observability.py
M tests/query_test/test_udfs.py
22 files changed, 420 insertions(+), 475 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11615/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1052/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 01:24:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 21:15:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1248/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 17:35:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/query-state.cc@495
PS5, Line 495: error:
             :   // This point is reached if there were general errors to start query fragment instances.
             :   // Wait for all running fragment instances to finish preparing and report status to the
             :   // coordinator to start query cancellation.
Since we are changing some error handling for thread creation, a way to test that is to run some tests with thread_creation_fault_injection=true and verify that nothing hangs/crashes.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 22:26:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 21:09:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

(3 comments)

as a followup to this jira, do we have any task that will look at how this will impact scaling up cluster sizes?

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/5/be/src/runtime/coordinator-backend-state.cc@283
PS5, Line 283: 
nit: can you briefly mention that every instance's instanceExecStatus in backend_exec_status.instance_exec_status() and its thrift profile tree in the 'TRuntimeProfileForest' are at the same index. A reader will not have to trace it back to QueryState::ConstructReport to make sure its true.

On that note, does it make sense to add the instance id to the TRuntimeProfileTree, so that we can add a dcheck here and catch any bug that might be caused due to any future changes to that code? Or will that be an overkill?


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/fragment-instance-state.cc@444
PS3, Line 444:   if (fragment_ctx_.fragment.output_sink.type != TDataSinkType::PLAN_ROOT_SINK) {
             :     // if we haven't already release this thread token in Prepare(), release it now
             :     ReleaseThreadToken();
             :   }
> This still seems like a conceptual step in the fragment life cycle so it ma
wfm


http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/3/be/src/runtime/query-state.cc@397
PS3, Line 397: void QueryState::ErrorDuringExecute(const Status& status, const TUniqueId& finst_id) {
> It's okay to skip the racy call for now until evidence suggests otherwise. 
wfm



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 21:34:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/11615/1/be/src/runtime/coordinator-backend-state.cc@277
PS1, Line 277:   lock_guard<SpinLock> l1(exec_summary->lock);
> Isn't this just reverting the change that the previous patch made?
Yes. Will revert that change in the previous patch and add a comment instead.



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Oct 2018 01:21:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11615/5//COMMIT_MSG@14
PS5, Line 14: In addition, due to the lack of coordination between query
            : fragment instances, a query may end without collecting the profiles
            : from all fragment instances when one of them hits an error before
            : another fragment instance manages to finish Prepare(), leading to
            : missing profiles for certain fragment instances.
> Great. Could you also remove the xfail from test_profile_fragment_instances
Done


http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/11615/6/tests/query_test/test_udfs.py@290
PS6, Line 290:       self.run_test_case('QueryTest/udf-codegen-required', vector, use_db=unique_database)
> This comment is outdated, and below
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 16:55:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4063: Merge report of query fragment instances per executor

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11615 )

Change subject: IMPALA-4063: Merge report of query fragment instances per executor
......................................................................


Patch Set 7: Code-Review+2

Carry +2


-- 
To view, visit http://gerrit.cloudera.org:8080/11615
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f95e026ba05631f33f48ce32da6db39c6f421fa
Gerrit-Change-Number: 11615
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Nov 2018 16:56:21 +0000
Gerrit-HasComments: No