You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2020/11/02 23:38:33 UTC
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16690
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
IMPALA-9864: Produce a minidump when TestValidateMetrics fails
After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad
generate a minidump (by sending SIGUSR1) when we hit the timeout.
Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.
The new error message looks like this:
E AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E Dumping debug webpages in JSON format...
E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/memz.json
E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/metrics.json
E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/queries.json
E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/sessions.json
E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/threadz.json
E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/rpcz.json
E Dumping minidumps for 3 running impalads...
E Dumped minidump for PID 2709
E Dumped minidump for PID 2714
E Dumped minidump for PID 2721
Testing:
- Tried out the dump function on my developer machine
- Verified the minidumps exist
- Verified the JSON is readable
Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
---
M tests/common/impala_service.py
1 file changed, 62 insertions(+), 10 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/16690/1
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 3:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6631/ DRY_RUN=true
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:06:28 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 3: Code-Review+2
(1 comment)
http://gerrit.cloudera.org:8080/#/c/16690/3/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/3/tests/common/impala_service.py@181
PS3, Line 181: "-f",
hmm, this sounds weird, I don't remember such issues on my desktop
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:57:38 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 1:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7604/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:59:44 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16690
to look at the new patch set (#2).
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
IMPALA-9864: Produce a minidump when TestValidateMetrics fails
After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad
generate a minidump (by sending SIGUSR1) when we hit the timeout.
Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.
The new error message looks like this:
E AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E Dumping debug webpages in JSON format...
E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/memz.json
E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/metrics.json
E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/queries.json
E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/sessions.json
E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/threadz.json
E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_1604359071/json/rpcz.json
E Dumping minidumps for 3 running impalads...
E Dumped minidump for PID 2709
E Dumped minidump for PID 2714
E Dumped minidump for PID 2721
This also fixes various flake8 errors (unnecessary imports, etc), so
now impala_service.py is flake8 clean.
Testing:
- Tried out the dump function on my developer machine
- Verified the minidumps exist
- Verified the JSON is readable
Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
---
M tests/common/impala_service.py
1 file changed, 70 insertions(+), 21 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/16690/2
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py@136
PS1, Line 136: d
flake8: E303 too many blank lines (2)
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:39:20 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 3:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7611/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:27:17 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 2: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 05:28:51 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Hello Qifan Chen, Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16690
to look at the new patch set (#3).
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
IMPALA-9864: Produce a minidump when TestValidateMetrics fails
After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad/Catalogd
generate a minidump (by sending SIGUSR1) when we hit the timeout.
Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.
The new error message looks like this:
E AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E Dumping debug webpages in JSON format...
E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/memz.json
E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/metrics.json
E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/queries.json
E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/sessions.json
E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/threadz.json
E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/rpcz.json
E Dumping minidumps for impalads/catalogds...
E Dumped minidump for Impalad PID 2709
E Dumped minidump for Impalad PID 2714
E Dumped minidump for Impalad PID 2721
E Dumped minidump for Catalogd PID 2627
This also fixes various flake8 errors (unnecessary imports, etc), so
now impala_service.py is flake8 clean.
Testing:
- Tried out the dump function on my developer machine
- Verified the minidumps exist
- Verified the JSON is readable
Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
---
M tests/common/impala_service.py
1 file changed, 89 insertions(+), 21 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/16690/3
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 3: Code-Review+1
(2 comments)
Thanks a lot for addressing the comments!
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@139
PS2, Line 139:
> Changed this to used datetime, so the directory would have a name like:
Done
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@159
PS2, Line 159: "metric
> Statestore is usually less interesting, because it is mostly a publisher/su
Done
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:28:23 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 2:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/7605/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 00:15:42 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/1/tests/common/impala_service.py@136
PS1, Line 136: d
> flake8: E303 too many blank lines (2)
Done
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Nov 2020 23:54:48 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 2:
(2 comments)
Looks good to me!
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@139
PS2, Line 139: format(int(time()))
It may help diagnose later on by putting the date and the timestamp (in a format close to what is shown in *INFO etc log files) into the path.
Example: $IMPALA_HOME/logs/metric_timeout_diags_10302020_09:00:39.258774/
From impalad.INFO
I1103 09:00:39.258778 227319 runtime-state.cc:196] 70495a87ff023170:9758637200000003] Error from query 70495a87ff023170:9758637200000000: Row with null value violates nullability constraint on table 'impa la::default.table_kudu'.
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@159
PS2, Line 159: impalad
May be useful to get a minidump for statestored? I was not be able to see catalogd running on my box, although the start up message lists it.
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 15:51:30 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
IMPALA-9864: Produce a minidump when TestValidateMetrics fails
After running end-to-end tests, run-tests.py runs verifiers to
check that a set of metrics are zero. When this fails, it can
indicate a hung query fragment or other resource leak (see
IMPALA-9842 for example). To track this down, it is useful to
have a minidump, so this adds a step to have every Impalad/Catalogd
generate a minidump (by sending SIGUSR1) when we hit the timeout.
Also, the current error message dumps a bunch of unformatted
JSON from our Web UI. This is hard to read and painful to
cut/paste. This now dumps that JSON to files in a diagnostic
directory under the logs directory. The JSON is formatted
in a readable way. These files would be preserved along with
the rest of the logs directory for automated runs.
The new error message looks like this:
E AssertionError: Metric impala-server.num-queries-registered did not reach value 0 in 60s.
E Dumping debug webpages in JSON format...
E Dumped memz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/memz.json
E Dumped metrics JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/metrics.json
E Dumped queries JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/queries.json
E Dumped sessions JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/sessions.json
E Dumped threadz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/threadz.json
E Dumped rpcz JSON to $IMPALA_HOME/logs/metric_timeout_diags_20201103_13:51:02/json/rpcz.json
E Dumping minidumps for impalads/catalogds...
E Dumped minidump for Impalad PID 2709
E Dumped minidump for Impalad PID 2714
E Dumped minidump for Impalad PID 2721
E Dumped minidump for Catalogd PID 2627
This also fixes various flake8 errors (unnecessary imports, etc), so
now impala_service.py is flake8 clean.
Testing:
- Tried out the dump function on my developer machine
- Verified the minidumps exist
- Verified the JSON is readable
Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Reviewed-on: http://gerrit.cloudera.org:8080/16690
Reviewed-by: Qifan Chen <qc...@cloudera.com>
Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M tests/common/impala_service.py
1 file changed, 89 insertions(+), 21 deletions(-)
Approvals:
Qifan Chen: Looks good to me, but someone else must approve
Csaba Ringhofer: Looks good to me, approved
Impala Public Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 3: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Nov 2020 03:31:07 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 2: Code-Review+2
(1 comment)
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@140
PS2, Line 140: if not os.path.exists(diag_dir):
optional: I think it would improve readability to extract some parts to separate functions, e.g. dump_debug_pages(self, diag_dir) and maybe trigger_impalad_minidumps()
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 08:56:02 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 2:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6629/ DRY_RUN=true
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 00:03:52 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9864: Produce a minidump when TestValidateMetrics fails
Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16690 )
Change subject: IMPALA-9864: Produce a minidump when TestValidateMetrics fails
......................................................................
Patch Set 3:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py
File tests/common/impala_service.py:
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@139
PS2, Line 139:
> It may help diagnose later on by putting the date and the timestamp (in a f
Changed this to used datetime, so the directory would have a name like:
metric_timeout_diags_20201103_13:51:32
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@140
PS2, Line 140: Impala processes (impalad, catalogd, statestored) have a signal handler for
> optional: I think it would improve readability to extract some parts to sep
I split out the logic for dumping JSON to a file and the logic for requesting a minidump.
One hurdle is that I'm constructing the assert message as I go, so I'm leaving this as one big function for the time being.
http://gerrit.cloudera.org:8080/#/c/16690/2/tests/common/impala_service.py@159
PS2, Line 159: "metric
> May be useful to get a minidump for statestored? I was not be able to see c
Statestore is usually less interesting, because it is mostly a publisher/subscriber daemon. There aren't per-query resources on statestored.
I changed the code to also dump catalogd. Interestingly, pgrep has trouble finding it unless I look at the whole commandline (e.g. pgrep -f). Seems like a minor bug.
--
To view, visit http://gerrit.cloudera.org:8080/16690
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16d26052d0664ee0b115e3611cd96047d8ada19d
Gerrit-Change-Number: 16690
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Nov 2020 22:06:08 +0000
Gerrit-HasComments: Yes