You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2021/11/04 23:53:35 UTC

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17999


Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................

IMPALA-11006: Impalad crashes during query cancel tests

This patch addresses impalad crash during query cancel tests. The
root cause is that the wait thread in impalad assumes that the
coordinator is always available once the worker thread
async_exec_thread_ is joined. In reality, the coordinator may not
be available at all once the query is cancelled.

The patches specifically checks the cancel status when it is found
that the coordinator is not available and bails out immediately.
This prevents crash since the subsequent code that assumes the
coordinator is available is never reached.

Testing [TBD].

Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
---
M be/src/service/client-request-state.cc
1 file changed, 9 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17999/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9724/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Nov 2021 00:16:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Nov 2021 14:20:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7631/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Nov 2021 14:20:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7624/


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Nov 2021 19:50:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1070
PS1, Line 1070:   re
> My concern is about queries that do not have a coordinator at all, as Finis
CTAS is the only DDL that we call ExecQueryOrDmlRequest() and thus FinishExecQueryOrDmlRequest(). See 
https://github.com/apache/impala/blob/master/be/src/service/client-request-state.cc#L721. 

All other DDLs, if run async, will execute the rest of the code in ExecDdlRequestImpl(), or ExecDdlRequestImplSync() if run sync. 

Added a IF test for CTAS that requires a coordinator. If true, we further test is_cancelled_.


http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1076
PS1, Line 1076:  be no query
> MarkActive/Inactive affects query expiration only, which is not relevant in
MarkInactive() call is removed.



-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Nov 2021 03:32:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Nov 2021 13:26:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1070
PS1, Line 1070:   re
> CTAS is the only DDL that we call ExecQueryOrDmlRequest() and thus FinishEx
Thanks, I think it looks cleaner now!



-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Nov 2021 14:58:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7624/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Nov 2021 13:26:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Nov 2021 20:32:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................

IMPALA-11006: Impalad crashes during query cancel tests

This patch addresses impalad crash during query cancel tests. The
root cause is that the wait thread in impalad assumes that the
coordinator or computation results are always available once the
worker thread async_exec_thread_ is joined. In reality, this may
not be true at all once the query is cancelled.

The patch specifically checks the cancel status when it is found
that the coordinator is not available and bails out immediately.
This prevents crash since the subsequent code that assumes the
coordinator or computation results are available is never reached.

Testing:
  1. Ran core tests successfully.

Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
---
M be/src/service/client-request-state.cc
1 file changed, 9 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17999/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 1:

(2 comments)

Thanks a lot for the careful review!

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1070
PS1, Line 1070: else
> What is query_status_ in case of cancellation? Shouldn't it be in an error 
CTAS is processed async with a difference before the RPC patch.

Main steps involved:
1. COMPILE
2. CREATE TABLE
3. POPULATE TABLE

Before: 
1 and 2 in main thread, 3 in async_exec_thread

After:
2 in main thread, 2 and 3 in async_exec_thread

GetCoordinator() returns TRUE iff coord_exec_called_ is true which is set only inside ClientRequestState::FinishExecQueryOrDmlRequest(), which is called only if there is no interruption (like cancel)) and step 3 completes successfully.

In the core dump case, cancel is received from the client and the above method is never called. So GetCoordinator() is FALSE.

My limited understanding of the code is that the GetCoordinator() is available (and to get results from it) only if there is no interruption. We therefore have to check it being FALSE case here.


http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1076
PS1, Line 1076: MarkInactive
> I think that MarkInactive() is not needed in case of cancellation, as we ha
I instrumented the code to observe when MarkActive() and MarkInactive() for the relevant this object and found the following.

MarkActive(Exec()): this=0xe62e800, updated ref_count_=1
MarkActive(Finalize()): this=0xe62e800, updated ref_count_=2
MarkInactive(WaitInternal() for ELSE branch): this=0xe62e800, updated ref_count_=1

So if MarkInactive() is not called here, the ref count will be 2. Per code calling convention, EXEC() and WAIT() (which calls WaitInternal() are paired so ideally it is a good idea to decrement ref_count here.

But I am Okay with remove MarkInactive() as the reference count is not back to 0 anyway.



-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Nov 2021 15:23:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9744/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Nov 2021 14:11:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1070
PS1, Line 1070: else
> CTAS is processed async with a difference before the RPC patch.
My concern is about queries that do not have a coordinator at all, as FinishExecQueryOrDmlRequest is not called. My understanding is that there is no coordinator in case of DDLs, with the exception of CTAS. In this case it is normal to go in this branch - we will work correctly, but I still think that we should distinguish between the two cases.


http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1076
PS1, Line 1076: MarkInactive
> I instrumented the code to observe when MarkActive() and MarkInactive() for
MarkActive/Inactive affects query expiration only, which is not relevant in case the query was cancelled. ref_count_ should reach 0 only when Impala is really not doing any work on the query and we are waiting for the client to do another RPC.



-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 09 Nov 2021 16:08:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1070
PS1, Line 1070: else
What is query_status_ in case of cancellation? Shouldn't it be in an error status, which would lead to returning at line 1058?

I looked around and I think that we are quite inconsistent here - Cancel() only updates the query status if there is a "cause":
https://github.com/apache/impala/blob/9d2ef8564786d858db7786ad338b7daa5386eb20/be/src/service/client-request-state.cc#L1343

Otherwise only is_cancelled_  will become true. This seems intentional:
https://github.com/apache/impala/blob/9d2ef8564786d858db7786ad338b7daa5386eb20/be/src/service/client-request-state.h#L175

But if the cancellation happens at some certain times, we do set query_status_ regardless of having a cause or not:
https://github.com/apache/impala/blob/9d2ef8564786d858db7786ad338b7daa5386eb20/be/src/service/client-request-state.cc#L612

I think that this logic would be clearer if we would decide first whether GetCoordinator() should return non-null - my understanding is that there are some ddl-s that still won't have a coordinator. In case the it should have a coordinator but it doesn't, then we could check if it was cancelled and hit a DCHECK if it was not.


http://gerrit.cloudera.org:8080/#/c/17999/1/be/src/service/client-request-state.cc@1076
PS1, Line 1076: MarkInactive
I think that MarkInactive() is not needed in case of cancellation, as we have already closed the query and expect no more RPCs from the client. Calling MarkInactive() means that Impala is no longer active, but we are waiting for more activity from the client.



-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Nov 2021 13:28:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................

IMPALA-11006: Impalad crashes during query cancel tests

This patch addresses impalad crash during query cancel tests for
CTAS. The root cause is that the wait thread in impalad assumes
that the coordinator or computation results are always available
once the worker thread async_exec_thread_ is joined. In reality,
this never happens as the query is cancelled.

The patch specifically checks the cancel status when it is found
that the coordinator is not available for CTAS and bails out
immediately. This prevents crash since the subsequent code that
assumes the coordinator or computation results are available is
never reached.

Testing:
  1. Ran core tests successfully.

Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
---
M be/src/service/client-request-state.cc
1 file changed, 12 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17999/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9749/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Nov 2021 03:54:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11006: Impalad crashes during query cancel tests

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17999 )

Change subject: IMPALA-11006: Impalad crashes during query cancel tests
......................................................................

IMPALA-11006: Impalad crashes during query cancel tests

This patch addresses impalad crash during query cancel tests for
CTAS. The root cause is that the wait thread in impalad assumes
that the coordinator or computation results are always available
once the worker thread async_exec_thread_ is joined. In reality,
this never happens as the query is cancelled.

The patch specifically checks the cancel status when it is found
that the coordinator is not available for CTAS and bails out
immediately. This prevents crash since the subsequent code that
assumes the coordinator or computation results are available is
never reached.

Testing:
  1. Ran core tests successfully.

Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Reviewed-on: http://gerrit.cloudera.org:8080/17999
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/service/client-request-state.cc
1 file changed, 12 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia49411f8525734b8d463d9ffbfbce705b90a8d73
Gerrit-Change-Number: 17999
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>