You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Xianqing He (Code Review)" <ge...@cloudera.org> on 2020/12/30 11:37:34 UTC

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Xianqing He has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16911


Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Testing:
Manual testing. I set sleep(10) before launching the query retry in a
separate thread, so I have time to cancel the retried query.

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/impala-server.cc
2 files changed, 16 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6811/


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Fri, 01 Jan 2021 14:27:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16911/3/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16911/3/tests/custom_cluster/test_query_retries.py@711
PS3, Line 711: i
flake8: F841 local variable 'impalad_service' is assigned to but never used



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Thu, 31 Dec 2020 15:08:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 14: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 14
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 11 May 2021 20:09:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7927/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 8
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sun, 03 Jan 2021 12:13:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
 Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
4 files changed, 107 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/12
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 12
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7966/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 06 Jan 2021 11:09:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16911/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16911/2//COMMIT_MSG@25
PS2, Line 25: Tests:
> Could you add some test coverage in tests/custom_cluster/test_query_retries
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Thu, 31 Dec 2020 15:07:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7921/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Thu, 31 Dec 2020 16:14:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16911/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16911/8//COMMIT_MSG@26
PS8, Line 26: stom_cluster/test_query_retries.py and manually
            : tested 100 times to 
> I think we need a debug action for this for adding test coverage.
Done


http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc@273
PS8, Line 273: dleRetryFailure(&status, &error_msg, request_st
> Maybe I'm misunderstanding something. ClientRequestState::Finalize() just b
This waits to finish the retry query's wait-thread. A normal process waits for the wait-thread to finish when QueryDriver::Finalize is called. But now the retry_request_state has not been moved to retried_client_request_state_. If not call retry_request_state->Finalize will hit the DCHECK in ClientRequestState::~ClientRequestState.
You can use test_retrying_query_cancel to reproduce the leaking after fixed the HandleRetryFailure


http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/service/impala-server.cc@2248
PS8, Line 2248:   user_has_profile_access = query_handle.user_has_profile_access();
> Any reason for not making was_retried = query_handle.WasRetried() && query_
Done


http://gerrit.cloudera.org:8080/#/c/16911/8/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16911/8/tests/custom_cluster/test_query_retries.py@705
PS8, Line 705:       statestored_args="--statestore_heartbeat_frequency_ms=60000")
> Do you verify that this new test fails without the fix? I ran it 40 times i
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 06 Jan 2021 10:47:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc@273
PS8, Line 273: if not call Finalize, the wait-thread will leak
Maybe I'm misunderstanding something. ClientRequestState::Finalize() just blocks until the wait-thread exits. It does nothing to terminate the wait-thread. Why calling Finalize() will help to prevent leaks?

The blocking point of the wait-thread is ClientRequestState::WaitInternal(). For a SELECT statement, it should exit after the coordinator fragment instance finishes/fails Prepare() and Open(). See more in FragmentInstanceState::Exec() about how opened_promise_ is set. Usually, ClientRequestState::WaitInternal() is waiting in Coordinator::Wait() which will wait in coord_instance_->WaitForOpen(). WaitForOpen() should return when the original query is cancelled.

If the wait-thread is still waiting, maybe there are something wrong in cancelling the coordinator fragment instance thread. We need to check deeper in that point.

Do you have a way to reproduce the leaking?


http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/service/impala-server.cc@2248
PS8, Line 2248:     retried_query_id = make_unique<TUniqueId>(query_handle.retried_id());
Any reason for not making was_retried = query_handle.WasRetried() && query_handle.IsSetRetriedId()? The current code will introduce a new state that was_retried is true but the retried_query_id is nullptr. This will break some DCHECKs:
https://github.com/apache/impala/blob/4099a606892c377b9e8c9c6df2a45a7d42afcaea/be/src/service/impala-server.cc#L897
https://github.com/apache/impala/blob/4099a606892c377b9e8c9c6df2a45a7d42afcaea/be/src/service/impala-server.cc#L739



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 8
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 05 Jan 2021 08:35:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
 Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
4 files changed, 58 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/9
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 9
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7130/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 15
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 12 May 2021 00:44:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8698/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 12
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 11 May 2021 04:19:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/runtime/query-driver.cc@296
PS4, Line 296: or che
> It's better to set runtime error for status before calling HandleRetryFailu
Done


http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/runtime/query-driver.cc@357
PS4, Line 357: ueryDriver::Handl
> status should not be nullptr, otherwise line 364 will crash. Add DCHECK for
Done


http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/service/impala-server.cc@2244
PS4, Line 2244: cancel
> nit: canceling and closing.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Fri, 01 Jan 2021 05:26:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 15: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 15
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 12 May 2021 06:37:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/runtime/query-driver.cc@296
PS4, Line 296: status
It's better to set runtime error for status before calling HandleRetryFailure().


http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/runtime/query-driver.cc@357
PS4, Line 357: status != nullptr
status should not be nullptr, otherwise line 364 will crash. Add DCHECK for status instead.


http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16911/4/be/src/service/impala-server.cc@2244
PS4, Line 2244: cancel
nit: canceling and closing.



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 4
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Thu, 31 Dec 2020 20:05:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7922/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 4
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Thu, 31 Dec 2020 16:47:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 2:

(1 comment)

Thanks for catching up this!

http://gerrit.cloudera.org:8080/#/c/16911/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16911/2//COMMIT_MSG@25
PS2, Line 25: Testing:
Could you add some test coverage in tests/custom_cluster/test_query_retries.py? We may need to extend test_retry_query_cancel or add some new tests similar to it.



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 31 Dec 2020 00:46:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
 Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
4 files changed, 58 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/10
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
1. Manual testing. I set sleep(10) before launching the query retry
in a separate thread to let me have time to cancel the retried query.
2. Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
3 files changed, 44 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 4
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
 Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Reviewed-on: http://gerrit.cloudera.org:8080/16911
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
4 files changed, 106 insertions(+), 14 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 16
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
1. Manual testing. I set sleep(10) before launching the query retry
in a separate thread to let me have time to cancel the retried query.
2. Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
3 files changed, 41 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 10:

(9 comments)

Thanks for updating the patch. The fix generally makes sense to me. Still have some minor questions to discuss.

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc@273
PS8, Line 273: dleRetryFailure(&status, &error_msg, request_st
> Is thread wait_thread_ still running? If yes, what's the waiting point in C
After checking more code paths, I think I understand why we need retry_request_state->Finalize() here. But we need to make these comments more clear.

      // 'retried_client_request_state_' hasn't been updated at this point, so the active
      // ClientRequestState is still the original one. When HandleRetryFailure unregisters
      // the retried query, it actually finalizes the original ClientRequestState. So we
      // have to explicitly finalize 'retry_request_state', otherwise we'll hit some
      // illegal states in destroying it.

The illegal states include wait_thread_ not reset, coordinator still executing, etc.


http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/runtime/query-driver.cc@85
PS10, Line 85:   if (client_request_state_->query_id() != query_id) return nullptr;
It's not clear to me how we will go into this state. If we encounter errors in RetryQueryFromThread, it seems no one will get the new query id. Is it possible to add test coverage for this?


http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/runtime/query-driver.cc@270
PS10, Line 270:     if (!status.ok()) {
Could you also add test coverage for this case?


http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/runtime/query-driver.cc@284
PS10, Line 284:   if (!status.ok()) {
Could you also add test coverage for this case?


http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/runtime/query-driver.cc@292
PS10, Line 292: CHECK_QUERY_DRIVER_DELAY
I think we should mention "retry" to avoid confusion. What about "RETRY_DELAY_CHECKING_ORIGINAL_DRIVER"?


http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/runtime/query-driver.cc@300
PS10, Line 300: Encountered error checking the original query
nit: "Failed to retry query since the original query was unregistered"


http://gerrit.cloudera.org:8080/#/c/16911/10/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16911/10/tests/custom_cluster/test_query_retries.py@721
PS10, Line 721: 
Could you add codes to validate the original query is not marked as retried? We can get the original query id by handle.get_handle().id, and then get the original query profile by this id. '__validate_runtime_profiles' has some example codes.


http://gerrit.cloudera.org:8080/#/c/16911/10/tests/custom_cluster/test_query_retries.py@722
PS10, Line 722:     # Close the query.
Do we really need to close a cancelled query?


http://gerrit.cloudera.org:8080/#/c/16911/10/tests/custom_cluster/test_query_retries.py@723
PS10, Line 723:     self.client.close_query(handle)
We should check whether coordinator still lives. The cancel and close operations return immediately, but the crash may happen laster. So adding the following codes help to actuallly capture the bug:

    time.sleep(2)
    assert self.cluster.impalads[0].get_pid() is not None, "Coordinator crashed"

Without these, the test still pass even the coordinator crashes.



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Mon, 11 Jan 2021 09:59:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 15: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 15
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 12 May 2021 00:44:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16911/10/be/src/service/impala-server.cc@1369
PS10, Line 1369:     LOG(ERROR) << Substitute("Query de-registration for query_id={0} failed: {1}",
               :         PrintId(query_id, cause->GetDetail()));
This is incorrect. Could you fix it by the way? It should be

    LOG(ERROR) << Substitute("Query de-registration for query_id=$0 failed: $1",
        PrintId(query_id), cause->GetDetail());

You can find this log in impalad.ERROR.



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Mon, 11 Jan 2021 10:06:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 8: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 8
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sun, 03 Jan 2021 17:27:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7924/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Fri, 01 Jan 2021 05:48:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
1. Manual testing. I set sleep(10) before launching the query retry
in a separate thread to let me have time to cancel the retried query.
2. Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
3 files changed, 41 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 6
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8699/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 14
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 11 May 2021 11:55:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7925/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Fri, 01 Jan 2021 08:46:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
 Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
4 files changed, 106 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/14
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 14
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 12: Code-Review+1

(2 comments)

Thanks for updating the patch! I'll see if Wenzhe want to have a final look.
Left some minor comments on the test.

http://gerrit.cloudera.org:8080/#/c/16911/12/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16911/12/tests/custom_cluster/test_query_retries.py@746
PS12, Line 746:  == True
nit: Don't need "== True" for bool.


http://gerrit.cloudera.org:8080/#/c/16911/12/tests/custom_cluster/test_query_retries.py@774
PS12, Line 774:     assert self.cluster.get_first_impalad().service.get_num_in_flight_queries() == 1
This may be flaky since we check it immediately. Could you add a wait before this line? e.g.

 self.wait_for_state(handle, self.client.QUERY_STATES['RUNNING'], 60)



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 12
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 11 May 2021 06:35:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6814/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 8
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Sun, 03 Jan 2021 11:53:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16911/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16911/8//COMMIT_MSG@26
PS8, Line 26: set sleep(10) before launching the query retry
            : in a separate thread
I think we need a debug action for this for adding test coverage.


http://gerrit.cloudera.org:8080/#/c/16911/8/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16911/8/tests/custom_cluster/test_query_retries.py@705
PS8, Line 705:   def test_retrying_query_cancel(self):
Do you verify that this new test fails without the fix? I ran it 40 times in my local branch without your fix. It always succeeds.

I think it's rare to fail the query in RETRYING state. We may need to add a debug action to introduce some configurable sleeps in QueryDriver::RetryQueryFromThread().



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 8
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Mon, 04 Jan 2021 12:24:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7923/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 5
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Fri, 01 Jan 2021 05:40:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7915/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 2
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 30 Dec 2020 11:59:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
1. Manual testing. I set sleep(10) before launching the query retry
in a separate thread to let me have time to cancel the retried query.
2. Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
3 files changed, 40 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/7
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 14: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 14
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 12 May 2021 00:43:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7965/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 9
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 06 Jan 2021 11:06:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6811/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 7
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Fri, 01 Jan 2021 08:51:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16911/12/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16911/12/tests/custom_cluster/test_query_retries.py@746
PS12, Line 746: =
flake8: E712 comparison to True should be 'if cond is True:' or 'if cond:'


http://gerrit.cloudera.org:8080/#/c/16911/12/tests/custom_cluster/test_query_retries.py@746
PS12, Line 746: i
flake8: E501 line too long (105 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 12
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Tue, 11 May 2021 03:58:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
1. Manual testing. I set sleep(10) before launching the query retry
in a separate thread to let me have time to cancel the retried query.
2. Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/client-request-state.h
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
4 files changed, 46 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/8
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 8
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Xianqing He (Code Review)" <ge...@cloudera.org>.
Xianqing He has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................

IMPALA-10413: fix impalad crashes when canceling the retrying query

The crash happens when canceling the retrying query.

If the original query was unregistered while the new query was
being created, it will call HandleRetryFailure to abort the new
query. But the status is ok, so when calling Status::AddDetail
impalad will crash.

After the WaitAsync interface called and before the
retry_request_state moved to retried_client_request_state_ , if
abort the new retry query, retry_request_state need to call
Finalize, otherwise the wait-thread will leak.

In some cases like canceled the original query or closed the session
we may not create the new query, so we also check whether the query
is retried.

Tests:
1. Manual testing. I set sleep(10) before launching the query retry
in a separate thread to let me have time to cancel the retried query.
2. Add test in tests/custom_cluster/test_query_retries.py and manually
tested 100 times to make sure that there was no Impalad crash

Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
---
M be/src/runtime/query-driver.cc
M be/src/service/impala-server.cc
M tests/custom_cluster/test_query_retries.py
3 files changed, 46 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16911/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 3
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10413: fix impalad crashes when canceling the retrying query

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16911 )

Change subject: IMPALA-10413: fix impalad crashes when canceling the retrying query
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16911/8/be/src/runtime/query-driver.cc@273
PS8, Line 273: dleRetryFailure(&status, &error_msg, request_st
> This waits to finish the retry query's wait-thread. A normal process waits 
Is thread wait_thread_ still running? If yes, what's the waiting point in ClientRequestState::WaitInternal()? If it's not running, we may add new simple function to reset wait_thread_, instead of calling  ClientRequestState::Finalize().



-- 
To view, visit http://gerrit.cloudera.org:8080/16911
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd7228acd0a70d33859029052239f9b9f795e5d
Gerrit-Change-Number: 16911
Gerrit-PatchSet: 10
Gerrit-Owner: Xianqing He <he...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Comment-Date: Wed, 06 Jan 2021 23:54:11 +0000
Gerrit-HasComments: Yes