You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2017/07/03 15:09:00 UTC
[jira] [Commented] (STORM-2194) ReportErrorAndDie doesn't always
die
[ https://issues.apache.org/jira/browse/STORM-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072595#comment-16072595 ]
Jungtaek Lim commented on STORM-2194:
-------------------------------------
Miss porting back to 1.1.x-branch. Will do.
> ReportErrorAndDie doesn't always die
> ------------------------------------
>
> Key: STORM-2194
> URL: https://issues.apache.org/jira/browse/STORM-2194
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 2.0.0, 1.0.2
> Reporter: Craig Hawco
> Assignee: Paul Poulosky
> Fix For: 2.0.0, 1.0.4, 1.2.0
>
> Attachments: scrubbed-thread-dump.txt
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> I've been trying to track down a cause of some of our issues with some exceptions leaving Storm workers in a zombified state for some time. I believe I've isolated the bug to the behaviour in :report-error-and-die/reportErrorAndDie in the executor. Essentially:
> {code}
> :report-error-and-die (fn [error]
> (try
> ((:report-error <>) error)
> (catch Exception e
> (log-message "Error while reporting error to cluster, proceeding with shutdown")))
> (if (or
> (exception-cause? InterruptedException error)
> (exception-cause? java.io.InterruptedIOException error))
> (log-message "Got interrupted excpetion shutting thread down...")
> ((:suicide-fn <>))))
> {code}
> has the grouping for the if statement slightly wrong. It shouldn't log OR die from InterruptedException/InterruptedIOException, but it should log under that condition, and ALWAYS die.
> Basically:
> {code}
> :report-error-and-die (fn [error]
> (try
> ((:report-error <>) error)
> (catch Exception e
> (log-message "Error while reporting error to cluster, proceeding with shutdown")))
> (if (or
> (exception-cause? InterruptedException error)
> (exception-cause? java.io.InterruptedIOException error))
> (log-message "Got interrupted excpetion shutting thread down..."))
> ((:suicide-fn <>)))
> {code}
> After digging into the Java port of this code, it looks like a different bug was introduced while porting:
> {code}
> if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
> || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
> LOG.info("Got interrupted exception shutting thread down...");
> suicideFn.run();
> }
> {code}
> Was how this was initially ported, and STORM-2142 changed this to:
> {code}
> if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
> || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
> LOG.info("Got interrupted exception shutting thread down...");
> } else {
> suicideFn.run();
> }
> {code}
> However, I believe the correct port is as described above:
> {code}
> if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
> || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
> LOG.info("Got interrupted exception shutting thread down...");
> }
> suicideFn.run();
> {code}
> I'll look into providing patches for the 1.x and 2.x branches shortly.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)