You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/06/23 17:36:10 UTC

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7275

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................

IMPALA-5567: race in fragment instance teardown

The bug is that PlanRootSink::GetNext() calls
RuntimeState::CheckQueryState() was called concurrently with
RuntimeState::ReleaseResources() and got a reference

The other callsites of CheckQueryState() are safe because they are in two
categories:
* ExecNodes or DataSink methods executed by fragment instance execution
  threads, which must terminate before the runtime state resources are
  released.
* In FeSupport where ReleaseResources() called after CheckQueryState()

There is no need to asynchronously check for memory limit
exceeded in PlanRootSink::GetNext() since that method does
not allocate tracked memory that could push the query over
the memory limit.

Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
---
M be/src/exec/plan-root-sink.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
3 files changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7275/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 6: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
I think this method is confusing to begin with, which probably led us to misuse it. We have a very overloaded concept of 'query state'.

We don't necessarily have to address this now (e.g. the change in plan-fragment-root.cc seems good regardless), but we should probably rethink it.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Michael Ho, Lars Volker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7275

to look at the new patch set (#3).

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................

IMPALA-5567: race in fragment instance teardown

The bug is that PlanRootSink::GetNext() calls
RuntimeState::CheckQueryState() was called concurrently with
RuntimeState::ReleaseResources() and got a reference

The other callsites of CheckQueryState() are safe because they are in two
categories:
* ExecNodes or DataSink methods executed by fragment instance execution
  threads, which must terminate before the runtime state resources are
  released.
* In FeSupport where ReleaseResources() called after CheckQueryState()

There is no need to asynchronously check for memory limit
exceeded in PlanRootSink::GetNext() since that method does
not allocate tracked memory that could push the query over
the memory limit.

Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
---
M be/src/exec/plan-root-sink.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
3 files changed, 6 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7275/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1: Code-Review+1

I ran my local reproduction build with ASAN over the weekend and the issue did not reproduce. The code looks good to me, too.

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1: Code-Review-2

Running tests on this but -2 to prevent merge.

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 6:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/797/

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 102:     ScalarExprEvaluator::FreeLocalAllocations(output_expr_evals_);
> Not your change but should we check for query status here too ? I can imagi
Done


PS1, Line 150:  DCHECK(
> DCHECK_GE()
Done


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
> I think this method is confusing to begin with, which probably led us to mi
How about we rename them to be less vague. Maybe something like:

CheckQueryState() -> CheckMemLimitOrAsyncError()
GetQueryStatus() -> CheckAsyncError()
query_status_ -> async_error_status_


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/783/

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 2:

(1 comment)

Ok, I think we're back to the more minimal fix. Does this seem ok to everyone?

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101:     RETURN_IF_ERROR(state->GetQueryStatus());
> It doesn't seem strictly necessary for this change. Feel free to do it late
Reverted that part


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Lars Volker,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7275

to look at the new patch set (#2).

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................

IMPALA-5567: race in fragment instance teardown

The bug is that PlanRootSink::GetNext() calls
RuntimeState::CheckQueryState() was called concurrently with
RuntimeState::ReleaseResources() and got a reference

The other callsites of CheckQueryState() are safe because they are in two
categories:
* ExecNodes or DataSink methods executed by fragment instance execution
  threads, which must terminate before the runtime state resources are
  released.
* In FeSupport where ReleaseResources() called after CheckQueryState()

There is no need to asynchronously check for memory limit
exceeded in PlanRootSink::GetNext() since that method does
not allocate tracked memory that could push the query over
the memory limit.

Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
---
M be/src/exec/plan-root-sink.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
3 files changed, 9 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7275/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
> Yes, I wasn't suggesting that we should convert those Allocate() to TryAllo
Completely agree


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/4/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

Line 279:   // 'instance_mem_tracker_' and its children are destroyed.
> let's add a TODO here explaining the long term direction:
I filed IMPALA-5587 for that


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1: -Code-Review

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/4/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

Line 279:   // 'instance_mem_tracker_' and its children are destroyed.
let's add a TODO here explaining the long term direction:
TODO: control structures (such as MemTrackers) shouldn't be destroyed here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 102:     ScalarExprEvaluator::FreeLocalAllocations(output_expr_evals_);
Not your change but should we check for query status here too ? I can imagine expression evaluations above can lead to MemLimitExceeded. Feel free to do it later though.


PS1, Line 150:  DCHECK(
DCHECK_GE()


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101:     results_ = nullptr;
> I don't feel strongly. I think the idea was to move these status checks mor
It doesn't seem strictly necessary for this change. Feel free to do it later.


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
> I think we should do the work of removing Allocate() calls at some point bu
Yes, I wasn't suggesting that we should convert those Allocate() to TryAllocate() in this change. That's just something we can do to get rid of CheckQueryState() altogether.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

MJ may wanna do another pass.

http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
> How about we rename them to be less vague. Maybe something like:
I wonder if most of the remaining callers can be converted to CheckQueryStatus() given that we do mostly use TryAllocate() instead of Allocate(). Most of the remaining callers are DataSinks. May be if we fix hdfs-table-writer and friends to use TryAllocate(), things will be easier ?


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101:     RETURN_IF_ERROR(state->GetQueryStatus());
At first glance this looks reasonable to put here, but is this necessary for this change? Looks like the caller is FIS::ExecInternal and the error status gets propagated. I'm just asking because we'll start seeing errors sooner, which seems like a good thing but may result in some different runtime behavior.


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
> I wonder if most of the remaining callers can be converted to CheckQuerySta
I like CheckAsyncError() for GetQueryStatus(). If we can convert all of the other callsites to CheckQueryStatus(), that'd clearly be ideal, but that seems like a bigger change to move Allocate -> TryAllocate. However, maybe we can split CheckQueryState() into a fn that just updates the status_ on mem limit exceeded, then the places where we know the memory may have been over consumed we call that before calling CheckAsyncError()? I guess CheckMemLimitOrAsyncError() is at least more clear for now. I'm OK with fixing this more localized issue now.

It's funny, CheckQueryStatus() clearly suffers from the QueryMaintenance() problem.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


IMPALA-5567: race in fragment instance teardown

The bug is that PlanRootSink::GetNext() calls
RuntimeState::CheckQueryState() was called concurrently with
RuntimeState::ReleaseResources() and got a reference

The other callsites of CheckQueryState() are safe because they are in two
categories:
* ExecNodes or DataSink methods executed by fragment instance execution
  threads, which must terminate before the runtime state resources are
  released.
* In FeSupport where ReleaseResources() called after CheckQueryState()

There is no need to asynchronously check for memory limit
exceeded in PlanRootSink::GetNext() since that method does
not allocate tracked memory that could push the query over
the memory limit.

Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Reviewed-on: http://gerrit.cloudera.org:8080/7275
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/plan-root-sink.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
3 files changed, 7 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Michael Ho, Lars Volker, Matthew Jacobs, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7275

to look at the new patch set (#5).

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................

IMPALA-5567: race in fragment instance teardown

The bug is that PlanRootSink::GetNext() calls
RuntimeState::CheckQueryState() was called concurrently with
RuntimeState::ReleaseResources() and got a reference

The other callsites of CheckQueryState() are safe because they are in two
categories:
* ExecNodes or DataSink methods executed by fragment instance execution
  threads, which must terminate before the runtime state resources are
  released.
* In FeSupport where ReleaseResources() called after CheckQueryState()

There is no need to asynchronously check for memory limit
exceeded in PlanRootSink::GetNext() since that method does
not allocate tracked memory that could push the query over
the memory limit.

Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
---
M be/src/exec/plan-root-sink.cc
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
3 files changed, 7 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/75/7275/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 6: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7275/4/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

Line 279:   // 'instance_mem_tracker_' and its children are destroyed.
> let's add a TODO here explaining the long term direction:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5567: race in fragment instance teardown

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5567: race in fragment instance teardown
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7275/2/be/src/exec/plan-root-sink.cc
File be/src/exec/plan-root-sink.cc:

Line 101:     results_ = nullptr;
> At first glance this looks reasonable to put here, but is this necessary fo
I don't feel strongly. I think the idea was to move these status checks more closely to the expr evaluations that could cause errors to be hit.


http://gerrit.cloudera.org:8080/#/c/7275/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

PS1, Line 294:   /// Returns a non-OK status if query execution should stop (e.g., the query was
             :   /// cancelled or a mem limit was exceeded). Exec nodes should check this periodically so
             :   /// execution doesn't continue if the query terminates abnormally. This should not be
             :   /// called after ReleaseResources().
             :   Status CheckQueryState();
> I like CheckAsyncError() for GetQueryStatus(). If we can convert all of the
I think we should do the work of removing Allocate() calls at some point but I think we should keep fix as targeted as possible. That might be an argument to defer doing the rename as well.


-- 
To view, visit http://gerrit.cloudera.org:8080/7275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If567e734042b5f2b82323368dd536dbf3bdf4744
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes