You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2020/12/01 08:56:10 UTC

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16800


Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................

IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout

Recently, we see many timeout failures of test_concurrent_ddls.py in S3
builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful
to dump the server stacktraces so we can understand why some RPCs are
slow/stuck.

This patch extracts the logic of dumping stacktraces in
script-timeout-check.sh to a separate script, script-timeout-check.sh.
The script also dumps jstacks of HMS and NameNode. Dumping all these
stacktraces is time-consuming so we do them in parallel, which also
helps to get consistent snapshots of all servers.

When any tests in test_concurrent_ddls.py timeout, we use
dump-stacktraces.sh to dump the stacktraces before exit. Previously,
some tests depend on pytest.mark.timeout for detecting timeouts. It's
hard to add a customized callback for dumping server stacktraces. So
this patch refactors test_concurrent_ddls.py to only use timeout of
multiprocessing.

Tests:
 - Tested the scripts locally.
 - (WIP) Run jenkins jobs.

Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
---
A bin/dump-stacktraces.sh
M bin/script-timeout-check.sh
M tests/custom_cluster/test_concurrent_ddls.py
M tests/util/shell_util.py
4 files changed, 103 insertions(+), 38 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16800/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................

IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout

Recently, we see many timeout failures of test_concurrent_ddls.py in S3
builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful
to dump the server stacktraces so we can understand why some RPCs are
slow/stuck.

This patch extracts the logic of dumping stacktraces in
script-timeout-check.sh to a separate script, dump-stacktraces.sh.
The script also dumps jstacks of HMS and NameNode. Dumping all these
stacktraces is time-consuming so we do them in parallel, which also
helps to get consistent snapshots of all servers.

When any tests in test_concurrent_ddls.py timeout, we use
dump-stacktraces.sh to dump the stacktraces before exit. Previously,
some tests depend on pytest.mark.timeout for detecting timeouts. It's
hard to add a customized callback for dumping server stacktraces. So
this patch refactors test_concurrent_ddls.py to only use timeout of
multiprocessing.

Tests:
 - Tested the scripts locally.
 - Verified the error handling of timeout logics in Jenkins jobs

Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Reviewed-on: http://gerrit.cloudera.org:8080/16800
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
A bin/dump-stacktraces.sh
M bin/script-timeout-check.sh
M tests/custom_cluster/test_concurrent_ddls.py
M tests/util/shell_util.py
4 files changed, 105 insertions(+), 38 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16800

to look at the new patch set (#3).

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................

IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout

Recently, we see many timeout failures of test_concurrent_ddls.py in S3
builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful
to dump the server stacktraces so we can understand why some RPCs are
slow/stuck.

This patch extracts the logic of dumping stacktraces in
script-timeout-check.sh to a separate script, dump-stacktraces.sh.
The script also dumps jstacks of HMS and NameNode. Dumping all these
stacktraces is time-consuming so we do them in parallel, which also
helps to get consistent snapshots of all servers.

When any tests in test_concurrent_ddls.py timeout, we use
dump-stacktraces.sh to dump the stacktraces before exit. Previously,
some tests depend on pytest.mark.timeout for detecting timeouts. It's
hard to add a customized callback for dumping server stacktraces. So
this patch refactors test_concurrent_ddls.py to only use timeout of
multiprocessing.

Tests:
 - Tested the scripts locally.
 - Verified the error handling of timeout logics in Jenkins jobs

Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
---
A bin/dump-stacktraces.sh
M bin/script-timeout-check.sh
M tests/custom_cluster/test_concurrent_ddls.py
M tests/util/shell_util.py
4 files changed, 105 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16800/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16800/1/bin/dump-stacktraces.sh
File bin/dump-stacktraces.sh:

http://gerrit.cloudera.org:8080/#/c/16800/1/bin/dump-stacktraces.sh@53
PS1, Line 53:   collect_gdb_backtraces catalogd $CATALOGD_PID && collect_jstacks catalogd $CATALOGD_PID &
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16800/1/tests/util/shell_util.py
File tests/util/shell_util.py:

http://gerrit.cloudera.org:8080/#/c/16800/1/tests/util/shell_util.py@32
PS1, Line 32: def dump_server_stacktraces():
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 01 Dec 2020 08:56:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

Thank Joe! Carry on the +2.

http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG@15
PS2, Line 15: dump-stacktraces.sh.
> Nit: dump-stacktraces.sh
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 02:24:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 08:05:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7756/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 01 Dec 2020 09:17:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

This makes sense to me. Thanks for debugging this!

http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16800/2//COMMIT_MSG@15
PS2, Line 15: script-timeout-check.sh
Nit: dump-stacktraces.sh



-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 01:12:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 02:30:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16800

to look at the new patch set (#2).

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................

IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout

Recently, we see many timeout failures of test_concurrent_ddls.py in S3
builds, e.g. IMPALA-10280, IMPALA-10301, IMPALA-10363. It'd be helpful
to dump the server stacktraces so we can understand why some RPCs are
slow/stuck.

This patch extracts the logic of dumping stacktraces in
script-timeout-check.sh to a separate script, script-timeout-check.sh.
The script also dumps jstacks of HMS and NameNode. Dumping all these
stacktraces is time-consuming so we do them in parallel, which also
helps to get consistent snapshots of all servers.

When any tests in test_concurrent_ddls.py timeout, we use
dump-stacktraces.sh to dump the stacktraces before exit. Previously,
some tests depend on pytest.mark.timeout for detecting timeouts. It's
hard to add a customized callback for dumping server stacktraces. So
this patch refactors test_concurrent_ddls.py to only use timeout of
multiprocessing.

Tests:
 - Tested the scripts locally.
 - Verified the error handling of timeout logics in Jenkins jobs

Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
---
A bin/dump-stacktraces.sh
M bin/script-timeout-check.sh
M tests/custom_cluster/test_concurrent_ddls.py
M tests/util/shell_util.py
4 files changed, 105 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/16800/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6725/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 03 Dec 2020 02:30:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10369: Dump server stacktraces when test concurrent ddls.py timeout

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16800 )

Change subject: IMPALA-10369: Dump server stacktraces when test_concurrent_ddls.py timeout
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7759/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16800
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I514cf2d0ff842805c0abf7211f2a395151174173
Gerrit-Change-Number: 16800
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 01 Dec 2020 12:02:26 +0000
Gerrit-HasComments: No