You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2020/11/02 23:38:33 UTC

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16690


Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................

IMPALA-9864: Produce a minidump when TestValidateMetrics fails

After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad
generate a minidump (by sending SIGUSR1) when we hit the timeout.

Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.

The new error message looks like this:
E   AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E   Dumping debug webpages in JSON format...
E   Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/memz.json
E   Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/metrics.json
E   Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/queries.json
E   Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/sessions.json
E   Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/threadz.json
E   Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/rpcz.json
E   Dumping minidumps for 3 running impalads...
E   Dumped minidump for PID 2709
E   Dumped minidump for PID 2714
E   Dumped minidump for PID 2721

Testing:
 - Tried out the dump function on my developer machine
 - Verified the minidumps exist
 - Verified the JSON is readable

Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
---
M tests/common/impala_service.py
1 file changed, 62 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/16690/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6631/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:06:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16690/3/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/3/tests/common/impala_service.py@181
PS3, Line 181: "-f",
hmm, this sounds weird, I don't remember such issues on my desktop



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:57:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7604/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:59:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16690

to look at the new patch set (#2).

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................

IMPALA-9864: Produce a minidump when TestValidateMetrics fails

After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad
generate a minidump (by sending SIGUSR1) when we hit the timeout.

Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.

The new error message looks like this:
E   AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E   Dumping debug webpages in JSON format...
E   Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/memz.json
E   Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/metrics.json
E   Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/queries.json
E   Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/sessions.json
E   Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/threadz.json
E   Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/rpcz.json
E   Dumping minidumps for 3 running impalads...
E   Dumped minidump for PID 2709
E   Dumped minidump for PID 2714
E   Dumped minidump for PID 2721

This also fixes various flake8 errors (unnecessary imports, etc), so
now impala_service.py is flake8 clean.

Testing:
 - Tried out the dump function on my developer machine
 - Verified the minidumps exist
 - Verified the JSON is readable

Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
---
M tests/common/impala_service.py
1 file changed, 70 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/16690/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py@136
PS1, Line 136: d
flake8: E303 too many blank lines (2)



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:39:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7611/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:27:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 05:28:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16690

to look at the new patch set (#3).

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................

IMPALA-9864: Produce a minidump when TestValidateMetrics fails

After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad/Catalogd
generate a minidump (by sending SIGUSR1) when we hit the timeout.

Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.

The new error message looks like this:
E   AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E   Dumping debug webpages in JSON format...
E   Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/memz.json
E   Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/metrics.json
E   Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/queries.json
E   Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/sessions.json
E   Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/threadz.json
E   Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/rpcz.json
E   Dumping minidumps for impalads/catalogds...
E   Dumped minidump for Impalad PID 2709
E   Dumped minidump for Impalad PID 2714
E   Dumped minidump for Impalad PID 2721
E   Dumped minidump for Catalogd PID 2627

This also fixes various flake8 errors (unnecessary imports, etc), so
now impala_service.py is flake8 clean.

Testing:
 - Tried out the dump function on my developer machine
 - Verified the minidumps exist
 - Verified the JSON is readable

Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
---
M tests/common/impala_service.py
1 file changed, 89 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/16690/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 3: Code-Review+1

(2 comments)

Thanks a lot for addressing the comments!

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@139
PS2, Line 139: 
> Changed this to used datetime, so the directory would have a name like:
Done


http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@159
PS2, Line 159: "metric
> Statestore is usually less interesting, because it is mostly a publisher/su
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:28:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7605/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 00:15:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py@136
PS1, Line 136: d
> flake8: E303 too many blank lines (2)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:54:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 2:

(2 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@139
PS2, Line 139: format(int(time()))
It may help diagnose later on by putting the date and the timestamp (in a format close to what is shown in *INFO etc log files) into the path.

Example:  $IMPALA_HOME/logs/metric_timeout_diags_10302020_09:00:39.258774/


From impalad.INFO
I1103 09:00:39.258778 227319 runtime-state.cc:196] 70495a87ff023170:9758637200000003] Error from query        70495a87ff023170:9758637200000000: Row with null value violates nullability constraint on table 'impa       la::default.table_kudu'.


http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@159
PS2, Line 159: impalad
May be useful to get a minidump for statestored? I was not be able to see catalogd running on my box, although the start up message lists it.



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 15:51:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................

IMPALA-9864: Produce a minidump when TestValidateMetrics fails

After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad/Catalogd
generate a minidump (by sending SIGUSR1) when we hit the timeout.

Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.

The new error message looks like this:
E   AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E   Dumping debug webpages in JSON format...
E   Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/memz.json
E   Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/metrics.json
E   Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/queries.json
E   Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/sessions.json
E   Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/threadz.json
E   Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/rpcz.json
E   Dumping minidumps for impalads/catalogds...
E   Dumped minidump for Impalad PID 2709
E   Dumped minidump for Impalad PID 2714
E   Dumped minidump for Impalad PID 2721
E   Dumped minidump for Catalogd PID 2627

This also fixes various flake8 errors (unnecessary imports, etc), so
now impala_service.py is flake8 clean.

Testing:
 - Tried out the dump function on my developer machine
 - Verified the minidumps exist
 - Verified the JSON is readable

Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Reviewed-on: http://gerrit.cloudera.org:8080/16690
Reviewed-by: Qifan Chen <qc...@cloudera.com>
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M tests/common/impala_service.py
1 file changed, 89 insertions(+), 21 deletions(-)

Approvals:
  Qifan Chen: Looks good to me, but someone else must approve
  Csaba Ringhofer: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Nov 2020 03:31:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@140
PS2, Line 140:     if not os.path.exists(diag_dir):
optional: I think it would improve readability to extract some parts to separate functions, e.g. dump_debug_pages(self, diag_dir) and maybe trigger_impalad_minidumps()



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 08:56:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6629/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 00:03:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )

Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:

http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@139
PS2, Line 139: 
> It may help diagnose later on by putting the date and the timestamp (in a f
Changed this to used datetime, so the directory would have a name like:
metric_timeout_diags_20201103_13:51:32


http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@140
PS2, Line 140:     Impala processes (impalad, catalogd, statestored) have a signal handler for
> optional: I think it would improve readability to extract some parts to sep
I split out the logic for dumping JSON to a file and the logic for requesting a minidump.

One hurdle is that I'm constructing the assert message as I go, so I'm leaving this as one big function for the time being.


http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@159
PS2, Line 159: "metric
> May be useful to get a minidump for statestored? I was not be able to see c
Statestore is usually less interesting, because it is mostly a publisher/subscriber daemon. There aren't per-query resources on statestored.

I changed the code to also dump catalogd. Interestingly, pgrep has trouble finding it unless I look at the whole commandline (e.g. pgrep -f). Seems like a minor bug.



-- 
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:06:08 +0000
Gerrit-HasComments: Yes