You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Craig Hawco (JIRA)" <ji...@apache.org> on 2016/11/09 19:33:58 UTC

[jira] [Updated] (STORM-2194) ReportErrorAndDie doesn't always die

     [ https://issues.apache.org/jira/browse/STORM-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Craig Hawco updated STORM-2194:
-------------------------------
    Description: 
I've been trying to track down a cause of some of our issues with some exceptions leaving Storm workers in a zombified state for some time. I believe I've isolated the bug to the behaviour in `:report-error-and-die`/reportErrorAndDie in the executor. Essentially:

{code}
     :report-error-and-die (fn [error]
                             (try
                               ((:report-error <>) error)
                               (catch Exception e
                                 (log-message "Error while reporting error to cluster, proceeding with shutdown")))
                             (if (or
                                    (exception-cause? InterruptedException error)
                                    (exception-cause? java.io.InterruptedIOException error))
                               (log-message "Got interrupted excpetion shutting thread down...")
                               ((:suicide-fn <>))))
{code}

has the grouping for the if statement slightly wrong. It shouldn't log OR die from InterruptedException/InterruptedIOException, but it should log under that condition, and ALWAYS die. 

Basically:

{code}
     :report-error-and-die (fn [error]
                             (try
                               ((:report-error <>) error)
                               (catch Exception e
                                 (log-message "Error while reporting error to cluster, proceeding with shutdown")))
                             (if (or
                                    (exception-cause? InterruptedException error)
                                    (exception-cause? java.io.InterruptedIOException error))
                               (log-message "Got interrupted excpetion shutting thread down..."))
                             ((:suicide-fn <>)))
{code}

After digging into the Java port of this code, it looks like a different bug was introduced while porting:

{code}
        if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
                || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
            LOG.info("Got interrupted exception shutting thread down...");
            suicideFn.run();
        }
{code}

Was how this was initially ported, and STORM-2142 changed this to:

{code}
        if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
                || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
            LOG.info("Got interrupted exception shutting thread down...");
        } else {
            suicideFn.run();
        }
    }
{code}

However, I believe the correct port is as described above:

{code}
        if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
                || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
            LOG.info("Got interrupted exception shutting thread down...");
        }
        suicideFn.run();
    }
{code}

I'll look into providing patches for the 1.x and 2.x branches shortly.

  was:
I've been trying to track down a cause of some of our issues with some exceptions leaving Storm workers in a zombified state for some time. I believe I've isolated the bug to the behaviour in `:report-error-and-die`/reportErrorAndDie in the executor. Essentially:

{code}
     :report-error-and-die (fn [error]
                             (try
                               ((:report-error <>) error)
                               (catch Exception e
                                 (log-message "Error while reporting error to cluster, proceeding with shutdown")))
                             (if (or
                                    (exception-cause? InterruptedException error)
                                    (exception-cause? java.io.InterruptedIOException error))
                               (log-message "Got interrupted excpetion shutting thread down...")
                               ((:suicide-fn <>))))
{code}

has the grouping for the if statement slightly wrong. It shouldn't log OR die from InterruptedException/InterruptedIOException, but it should log under that condition, and ALWAYS die. 

Basically:

{code}
     :report-error-and-die (fn [error]
                             (try
                               ((:report-error <>) error)
                               (catch Exception e
                                 (log-message "Error while reporting error to cluster, proceeding with shutdown")))
                             (if (or
                                    (exception-cause? InterruptedException error)
                                    (exception-cause? java.io.InterruptedIOException error))
                               (log-message "Got interrupted excpetion shutting thread down..."))
                               ((:suicide-fn <>)))
{code}

After digging into the Java port of this code, it looks like a different bug was introduced while porting:

{code}
        if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
                || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
            LOG.info("Got interrupted exception shutting thread down...");
            suicideFn.run();
        }
{code}

Was how this was initially ported, and STORM-2142 changed this to:

{code}
        if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
                || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
            LOG.info("Got interrupted exception shutting thread down...");
        } else {
            suicideFn.run();
        }
    }
{code}

However, I believe the correct port is as described above:

{code}
        if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
                || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
            LOG.info("Got interrupted exception shutting thread down...");
        }
        suicideFn.run();
    }
{code}

I'll look into providing patches for the 1.x and 2.x branches shortly.


> ReportErrorAndDie doesn't always die
> ------------------------------------
>
>                 Key: STORM-2194
>                 URL: https://issues.apache.org/jira/browse/STORM-2194
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 2.0.0, 1.0.2
>            Reporter: Craig Hawco
>
> I've been trying to track down a cause of some of our issues with some exceptions leaving Storm workers in a zombified state for some time. I believe I've isolated the bug to the behaviour in `:report-error-and-die`/reportErrorAndDie in the executor. Essentially:
> {code}
>      :report-error-and-die (fn [error]
>                              (try
>                                ((:report-error <>) error)
>                                (catch Exception e
>                                  (log-message "Error while reporting error to cluster, proceeding with shutdown")))
>                              (if (or
>                                     (exception-cause? InterruptedException error)
>                                     (exception-cause? java.io.InterruptedIOException error))
>                                (log-message "Got interrupted excpetion shutting thread down...")
>                                ((:suicide-fn <>))))
> {code}
> has the grouping for the if statement slightly wrong. It shouldn't log OR die from InterruptedException/InterruptedIOException, but it should log under that condition, and ALWAYS die. 
> Basically:
> {code}
>      :report-error-and-die (fn [error]
>                              (try
>                                ((:report-error <>) error)
>                                (catch Exception e
>                                  (log-message "Error while reporting error to cluster, proceeding with shutdown")))
>                              (if (or
>                                     (exception-cause? InterruptedException error)
>                                     (exception-cause? java.io.InterruptedIOException error))
>                                (log-message "Got interrupted excpetion shutting thread down..."))
>                              ((:suicide-fn <>)))
> {code}
> After digging into the Java port of this code, it looks like a different bug was introduced while porting:
> {code}
>         if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
>                 || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
>             LOG.info("Got interrupted exception shutting thread down...");
>             suicideFn.run();
>         }
> {code}
> Was how this was initially ported, and STORM-2142 changed this to:
> {code}
>         if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
>                 || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
>             LOG.info("Got interrupted exception shutting thread down...");
>         } else {
>             suicideFn.run();
>         }
>     }
> {code}
> However, I believe the correct port is as described above:
> {code}
>         if (Utils.exceptionCauseIsInstanceOf(InterruptedException.class, e)
>                 || Utils.exceptionCauseIsInstanceOf(java.io.InterruptedIOException.class, e)) {
>             LOG.info("Got interrupted exception shutting thread down...");
>         }
>         suicideFn.run();
>     }
> {code}
> I'll look into providing patches for the 1.x and 2.x branches shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)