You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2021/03/15 21:54:29 UTC

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17188


Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................

IMPALA-10577: Add retrying of AdmitQuery

This patch adds retries of the AdmitQuery rpc by coordinators.
This helps to ensure that if an admissiond goes down and is restarted
or is temporarily unreachable, queries won't fail.

The retries are done with backoff and jitter to avoid overloading the
admissiond in these scenarios.

A new flag, --admission_max_retry_time_s, is added to control how long
queries will continue retrying before giving up.

The AdmitQuery rpc is made idempotent - if a query is submitted with
the same query id as one the admissiond already knows about,
AdmitQuery will return OK without submitting the query to be scheduled
again.

Testing:
- Added a custom cluster test that checks that queries won't fail when
  the admissiond goes down.

Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
---
M be/src/scheduling/admission-control-service.cc
M be/src/scheduling/remote-admission-control-client.cc
M be/src/scheduling/remote-admission-control-client.h
M common/protobuf/admission_control_service.proto
M tests/custom_cluster/test_admission_controller.py
5 files changed, 126 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17188/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has removed a vote on this change.

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Removed Code-Review+2 by Bikramjeet Vig <bi...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40
PS1, Line 40: admission_max_retry_time_s
I am wondering if it's good to set maximum number of times for retrying RPC call, or set maximum times for retrying RPC call? What's typical time for an admissiond ready to accept request after it's restarted?


http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1385
PS1, Line 1385:     assert result.data == ["730"]
Could you add another test case for which sleep more than admission_max_retry_time_s before restart admissiond so that retry will fail?



-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 00:33:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6972/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:58:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:58:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:52:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Hello Wenzhe Zhou, Bikramjeet Vig, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17188

to look at the new patch set (#2).

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................

IMPALA-10577: Add retrying of AdmitQuery

This patch adds retries of the AdmitQuery rpc by coordinators.
This helps to ensure that if an admissiond goes down and is restarted
or is temporarily unreachable, queries won't fail.

The retries are done with backoff and jitter to avoid overloading the
admissiond in these scenarios.

A new flag, --admission_max_retry_time_s, is added to control how long
queries will continue retrying before giving up.

The AdmitQuery rpc is made idempotent - if a query is submitted with
the same query id as one the admissiond already knows about,
AdmitQuery will return OK without submitting the query to be scheduled
again.

Testing:
- Added a custom cluster test that checks that queries won't fail when
  the admissiond goes down.

Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
---
M be/src/scheduling/admission-control-service.cc
M be/src/scheduling/remote-admission-control-client.cc
M be/src/scheduling/remote-admission-control-client.h
M common/protobuf/admission_control_service.proto
M tests/custom_cluster/test_admission_controller.py
5 files changed, 137 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/17188/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6972/


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 00:41:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6977/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 18:53:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8378/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:25:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40
PS1, Line 40: admission_max_retry_time_s
> Given the use of back off and jitter, it's hard to relate a maximum number 
fair enough



-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:38:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 1:

hadn't noticed Wenzhe posted comments earlier


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 00:53:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Mar 2021 00:36:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc
File be/src/scheduling/remote-admission-control-client.cc:

http://gerrit.cloudera.org:8080/#/c/17188/1/be/src/scheduling/remote-admission-control-client.cc@40
PS1, Line 40: admission_max_retry_time_s
> I am wondering if it's good to set maximum number of times for retrying RPC
Given the use of back off and jitter, it's hard to relate a maximum number of retries with how long that number of retries will take, so I think its easier for users to think about a maximum amount of time to retry instead.

Its a good point that I choose this number (60 seconds) basically arbitrarily. It might be good to set it experimentally, but the right number is potentially going to depend a lot on the configuration of the system that's monitoring the admissiond and restarting it (eg. Kubernetes, CM, etc.) and there may not be a single value that's always appropriate.

I would also argue that if its going to take a very long time to get the admissiond restarted, it may be the right thing to just let some queries fail rather than having them all retry for long enough, since in the time it takes the admissiond to come up you could have a ton of queries sitting around retrying which could cause other problems (eg. the coordinator runs out of threads, the new admissiond gets overwhelmed by all the requests and falls over again, etc.), so 60 seconds seems reasonable to me.

Happy to file a JIRA to examine this more if you want.


http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1369
PS1, Line 1369: e
> flake8: E203 whitespace before ':'
Done


http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1385
PS1, Line 1385:     result = self.client.fetch(query, after_kill_handle)
> Could you add another test case for which sleep more than admission_max_ret
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 18:05:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8373/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Mar 2021 22:14:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................

IMPALA-10577: Add retrying of AdmitQuery

This patch adds retries of the AdmitQuery rpc by coordinators.
This helps to ensure that if an admissiond goes down and is restarted
or is temporarily unreachable, queries won't fail.

The retries are done with backoff and jitter to avoid overloading the
admissiond in these scenarios.

A new flag, --admission_max_retry_time_s, is added to control how long
queries will continue retrying before giving up.

The AdmitQuery rpc is made idempotent - if a query is submitted with
the same query id as one the admissiond already knows about,
AdmitQuery will return OK without submitting the query to be scheduled
again.

Testing:
- Added a custom cluster test that checks that queries won't fail when
  the admissiond goes down.

Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Reviewed-on: http://gerrit.cloudera.org:8080/17188
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/scheduling/admission-control-service.cc
M be/src/scheduling/remote-admission-control-client.cc
M be/src/scheduling/remote-admission-control-client.h
M common/protobuf/admission_control_service.proto
M tests/custom_cluster/test_admission_controller.py
5 files changed, 137 insertions(+), 28 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/17188/1/tests/custom_cluster/test_admission_controller.py@1369
PS1, Line 1369:  
flake8: E203 whitespace before ':'



-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:55:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10577: Add retrying of AdmitQuery

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17188 )

Change subject: IMPALA-10577: Add retrying of AdmitQuery
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8bc0cac666bbd613a1143c0e2c4f84d3b0ad003a
Gerrit-Change-Number: 17188
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 00:45:21 +0000
Gerrit-HasComments: No