You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Marshall (Code Review)" <ge...@cloudera.org> on 2019/02/12 22:55:33 UTC

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Thomas Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12461


Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................

IMPALA-8183: fix test_reportexecstatus_retry flakiness

The test is designed to cause ReportExecStatus() rpcs to fail by
backing up the control service queue. Prior to IMPALA-4555, after a
failed ReportExecStatus() we would wait
'report_status_retry_interval_ms' between retries, which was 100ms by
default and wasn't touched by the test. That 100ms was right on the
edge of being enough time for the coordinator to keep up with
processing the reports, so that some would fail but most would
succeed. It was always possible that we could hit 2990 in this setup,
but it was unlikely.

Now, we wait 'status_report_interval_ms'. By default, this is 5000ms,
so it should give the coordinator even more time and make these issues
less likely. However, the test sets 'status_report_interval_ms' to
10ms, which isn't nearly enough time for the coordinator to do its
processing, causing lots of the ReportExecStatus() rpcs to fail and
making us hit 2990 pretty often.

The solution is to set 'status_report_interval_ms' to 100ms in the
test, which roughly achieves the same retry frequency as before. The
same change is made to a similar test test_reportexecstatus_timeout.

Testing:
- Ran test_reportexecstatus_retry in a loop 400 times without seeing a
  failure. It previously repro-ed for me about once per 50 runs.
- Manually verified that both tests are still hitting the error paths
  that they are supposed to be testing.

Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
---
M tests/custom_cluster/test_rpc_timeout.py
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/12461/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2090/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:36:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Andrew Sherman, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12461

to look at the new patch set (#2).

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................

IMPALA-8183: fix test_reportexecstatus_retry flakiness

The test is designed to cause ReportExecStatus() rpcs to fail by
backing up the control service queue. Previously, after a failed
ReportExecStatus() we would wait 'report_status_retry_interval_ms'
between retries, which was 100ms by default and wasn't touched by the
test. That 100ms was right on the edge of being enough time for the
coordinator to keep up with processing the reports, so that some would
fail but most would succeed. It was always possible that we could hit
IMPALA-2990 in this setup, but it was unlikely.

Now, with IMPALA-4555 'report_status_retry_interval_ms' was removed
and we instead wait 'status_report_interval_ms' between retries. By
default, this is 5000ms, so it should give the coordinator even more
time and make these issues less likely. However, the test sets
'status_report_interval_ms' to 10ms, which isn't nearly enough time
for the coordinator to do its processing, causing lots of the
ReportExecStatus() rpcs to fail and making us hit IMPALA-2990 pretty
often.

The solution is to set 'status_report_interval_ms' to 100ms in the
test, which roughly achieves the same retry frequency as before. The
same change is made to a similar test test_reportexecstatus_timeout.

Testing:
- Ran test_reportexecstatus_retry in a loop 400 times without seeing a
  failure. It previously repro-ed for me about once per 50 runs.
- Manually verified that both tests are still hitting the error paths
  that they are supposed to be testing.

Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
---
M tests/custom_cluster/test_rpc_timeout.py
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/12461/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@12
PS1, Line 12: between retries, which was 100ms b
> Please also mention that this flag is removed after IMPALA-4555.
Done


http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@16
PS1, Line 16: IMPALA-2990 in this setup, but it was unlikely.
> I had to think for a while, if you say "IMPALA-2990" it would be clearer
Done


http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@19
PS1, Line 19: and we instead wait 'status_report_interval_ms' between retries. By
> between retries
Done


http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@24
PS1, Line 24: ReportExecStatus() rpcs to fail and making us hit IMPALA-2990 pretty
> IMPALA-2990
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:20:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Comment-Date: Wed, 13 Feb 2019 03:20:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Thomas Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 2: Code-Review+2

carrying forward


-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:20:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 1: Code-Review+1

(2 comments)

LGTM

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@16
PS1, Line 16: succeed. It was always possible that we could hit 2990 in this setup,
I had to think for a while, if you say "IMPALA-2990" it would be clearer


http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@24
PS1, Line 24: making us hit 2990 pretty often.
IMPALA-2990



-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:00:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:21:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3767/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 3
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:21:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................


Patch Set 1: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@12
PS1, Line 12: 'report_status_retry_interval_ms' 
Please also mention that this flag is removed after IMPALA-4555.


http://gerrit.cloudera.org:8080/#/c/12461/1//COMMIT_MSG@19
PS1, Line 19: Now, we wait 'status_report_interval_ms'. By default, this is 5000ms,
between retries



-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Feb 2019 23:15:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8183: fix test reportexecstatus retry flakiness

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12461 )

Change subject: IMPALA-8183: fix test_reportexecstatus_retry flakiness
......................................................................

IMPALA-8183: fix test_reportexecstatus_retry flakiness

The test is designed to cause ReportExecStatus() rpcs to fail by
backing up the control service queue. Previously, after a failed
ReportExecStatus() we would wait 'report_status_retry_interval_ms'
between retries, which was 100ms by default and wasn't touched by the
test. That 100ms was right on the edge of being enough time for the
coordinator to keep up with processing the reports, so that some would
fail but most would succeed. It was always possible that we could hit
IMPALA-2990 in this setup, but it was unlikely.

Now, with IMPALA-4555 'report_status_retry_interval_ms' was removed
and we instead wait 'status_report_interval_ms' between retries. By
default, this is 5000ms, so it should give the coordinator even more
time and make these issues less likely. However, the test sets
'status_report_interval_ms' to 10ms, which isn't nearly enough time
for the coordinator to do its processing, causing lots of the
ReportExecStatus() rpcs to fail and making us hit IMPALA-2990 pretty
often.

The solution is to set 'status_report_interval_ms' to 100ms in the
test, which roughly achieves the same retry frequency as before. The
same change is made to a similar test test_reportexecstatus_timeout.

Testing:
- Ran test_reportexecstatus_retry in a loop 400 times without seeing a
  failure. It previously repro-ed for me about once per 50 runs.
- Manually verified that both tests are still hitting the error paths
  that they are supposed to be testing.

Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Reviewed-on: http://gerrit.cloudera.org:8080/12461
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M tests/custom_cluster/test_rpc_timeout.py
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7027a6e099c543705e5845ee0e5268f1f9a3fb05
Gerrit-Change-Number: 12461
Gerrit-PatchSet: 4
Gerrit-Owner: Thomas Marshall <th...@cmu.edu>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <th...@cmu.edu>