You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2018/11/02 23:44:43 UTC

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11871


Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................

IMPALA-6436: exit instead of abort for catalog startup failure

Rename EXIT_WITH_EXC to ABORT_WITH_EXC to make the behaviour more
obvious at callsites.

Handle exceptions from Catalog constructor by logging the backtrace and
exiting cleanly, rather than aborting. This will prevent generation of a
coredump or minidump.

Testing:
Tested starting the catalogd locally without the HMS running and a
low connection timeout:

  start-impala-cluster.py --catalogd_args=--initial_hms_cnxn_timeout_s=2

Confirmed that the backtrace was logged to catalogd.ERROR and that no
core or minidump was generated.

Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
---
M be/src/catalog/catalog.cc
M be/src/scheduling/request-pool-service.cc
M be/src/service/fe-support.cc
M be/src/service/frontend.cc
M be/src/util/jni-util.h
M be/src/util/logging-support.cc
M be/src/util/zip-util.cc
7 files changed, 40 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/11871/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 1:

(1 comment)

What is the criterion that you used to decide between clean_exit or abort? I feel that most aborts() can be can be converted to clean exits. Like

- Anything that runs in the process setup phase like FindClass()/GetMethodID() etc, the stack trace should be helpful enough to figure out the problem.

- All New*() methods (like NewString()) return a NULL / throw exception. We catch the exception anyway (like OOM) and dump it (but interestingly we are not doing a null check anywhere)

Wondering where exactly we might need core dump for debugging.

http://gerrit.cloudera.org:8080/#/c/11871/1/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/11871/1/be/src/util/jni-util.h@89
PS1, Line 89: log the backtrace at FATAL level and abort the
            : // process
nit: I think LOG(FATAL) calls the abort. Maybe clarify that?



-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 03 Nov 2018 00:18:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................

IMPALA-6436: exit instead of abort for catalog startup failure

Rename EXIT_WITH_EXC to ABORT_WITH_EXC to make the behaviour more
obvious at callsites.

Handle exceptions from Catalog constructor by logging the backtrace and
exiting cleanly, rather than aborting. This will prevent generation of a
coredump or minidump.

Testing:
Tested starting the catalogd locally without the HMS running and a
low connection timeout:

  start-impala-cluster.py --catalogd_args=--initial_hms_cnxn_timeout_s=2

Confirmed that the backtrace was logged to catalogd.ERROR and that no
core or minidump was generated.

Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Reviewed-on: http://gerrit.cloudera.org:8080/11871
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/catalog/catalog.cc
M be/src/scheduling/request-pool-service.cc
M be/src/service/fe-support.cc
M be/src/service/frontend.cc
M be/src/util/jni-util.h
M be/src/util/logging-support.cc
M be/src/util/zip-util.cc
7 files changed, 40 insertions(+), 21 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1268/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sat, 03 Nov 2018 00:35:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3419/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 17:00:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 2:

Yup, just trying to seize an easy opportunity to make things easier fo everyone.


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 17:01:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 1: Code-Review+2

Oh, you just changed the problematic call site mentioned in the jira. We can probably defer fixing the remaining call sites later since they've almost never been problematic in the user setups. Thanks for fixing this.


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 01:40:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 17:00:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 1: Code-Review+2

Thanks for tackling this. I think this is strictly better than what we had.

At some point Dimitris suggested that we be more explicit about checking that the error actually is "Couldn't talk to HMS" rather than some other error, but, I think in practice it doesn't matter.

I think we could also reasonably have just looped forever and kept trying to talk to HMS, but that's a different change, and this one is here now :)


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 03 Nov 2018 21:16:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 1:

(1 comment)

I only converted the one case that we knew to be problematic.

The other cases looked like they should probably only fail if something was serious broken so maybe we'd want a core dump? I don't really know since I couldn't think of plausible scenarios where they'd fail.

http://gerrit.cloudera.org:8080/#/c/11871/1/be/src/util/jni-util.h
File be/src/util/jni-util.h:

http://gerrit.cloudera.org:8080/#/c/11871/1/be/src/util/jni-util.h@89
PS1, Line 89: log the backtrace at FATAL level and abort the
            : // process
> nit: I think LOG(FATAL) calls the abort. Maybe clarify that?
I intended the comment to document the behaviour from the callers point of view, I think that's more of an implementation detail.



-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 03 Nov 2018 00:37:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Nov 2018 21:19:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6436: exit instead of abort for catalog startup failure

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11871 )

Change subject: IMPALA-6436: exit instead of abort for catalog startup failure
......................................................................


Patch Set 1:

Didn't know there was a Jira for this. I was planning to fix it. Will review this change.


-- 
To view, visit http://gerrit.cloudera.org:8080/11871
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4026dccb39843b847426112fc0fe9ba897e48dcc
Gerrit-Change-Number: 11871
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Nov 2018 23:49:41 +0000
Gerrit-HasComments: No