You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Boudnik (JIRA)" <ji...@apache.org> on 2009/06/17 23:42:07 UTC

[jira] Created: (HADOOP-6073) Unchecked exception thrown inside of BlockReceiver cause some threads hang

Unchecked exception thrown inside of BlockReceiver cause some threads hang
--------------------------------------------------------------------------

                 Key: HADOOP-6073
                 URL: https://issues.apache.org/jira/browse/HADOOP-6073
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Konstantin Boudnik
         Attachments: copy.txt.log, x2

One is able to inject all sorts of faults into Hadoop's classes using new fault injection framework (HADOOP-6003). 
I've been injecting unchecked exception (RuntimeException) into BlockReceiver.receivePacket() method before any
  of write() operations (e.g. line 401, 449, 463, 529) and running some of the existing HDFS tests. The injection of unchecked exceptions causes DataXceiver to die silently and without any traces.

>From a debugger run it seems like some threads are being left alive or not notified about the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6073) Unchecked exception thrown inside of BlockReceiver cause some threads hang

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720950#action_12720950 ] 

Konstantin Boudnik commented on HADOOP-6073:
--------------------------------------------

I also suggest to change line 555 to
} catch (Exception ioe) {

to make sure that an affected block is properly cleaned.



> Unchecked exception thrown inside of BlockReceiver cause some threads hang
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-6073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6073
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Konstantin Boudnik
>         Attachments: copy.txt.log, x2
>
>
> One is able to inject all sorts of faults into Hadoop's classes using new fault injection framework (HADOOP-6003). 
> I've been injecting unchecked exception (RuntimeException) into BlockReceiver.receivePacket() method before any
>   of write() operations (e.g. line 401, 449, 463, 529) and running some of the existing HDFS tests. The injection of unchecked exceptions causes DataXceiver to die silently and without any traces.
> From a debugger run it seems like some threads are being left alive or not notified about the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6073) Unchecked exception thrown inside of BlockReceiver cause some threads hang

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6073:
---------------------------------------

    Status: Patch Available  (was: Open)

M.b. something along these lines (see attachment)? Moving responder.interrupt() invocation to the finally {...} doesn't make us much good, because responder thread has to be stopped only in case of error.

Catching Throwable and wrapping it into IOException (as Raghu suggestion) seems to do the trick. I can confirm that I don't see any more of the 'hanging' behavior in the tests with this patch applied.

> Unchecked exception thrown inside of BlockReceiver cause some threads hang
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-6073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6073
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Konstantin Boudnik
>         Attachments: copy.txt.log, HADOOP-6073.patch, x2
>
>
> One is able to inject all sorts of faults into Hadoop's classes using new fault injection framework (HADOOP-6003). 
> I've been injecting unchecked exception (RuntimeException) into BlockReceiver.receivePacket() method before any
>   of write() operations (e.g. line 401, 449, 463, 529) and running some of the existing HDFS tests. The injection of unchecked exceptions causes DataXceiver to die silently and without any traces.
> From a debugger run it seems like some threads are being left alive or not notified about the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6073) Unchecked exception thrown inside of BlockReceiver cause some threads hang

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6073:
---------------------------------------

    Attachment: HADOOP-6073.patch

> Unchecked exception thrown inside of BlockReceiver cause some threads hang
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-6073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6073
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Konstantin Boudnik
>         Attachments: copy.txt.log, HADOOP-6073.patch, x2
>
>
> One is able to inject all sorts of faults into Hadoop's classes using new fault injection framework (HADOOP-6003). 
> I've been injecting unchecked exception (RuntimeException) into BlockReceiver.receivePacket() method before any
>   of write() operations (e.g. line 401, 449, 463, 529) and running some of the existing HDFS tests. The injection of unchecked exceptions causes DataXceiver to die silently and without any traces.
> From a debugger run it seems like some threads are being left alive or not notified about the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6073) Unchecked exception thrown inside of BlockReceiver cause some threads hang

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik updated HADOOP-6073:
---------------------------------------

    Attachment: x2
                copy.txt.log

A unit test log file and jstack dump of the VM, running the tests

> Unchecked exception thrown inside of BlockReceiver cause some threads hang
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-6073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6073
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Konstantin Boudnik
>         Attachments: copy.txt.log, x2
>
>
> One is able to inject all sorts of faults into Hadoop's classes using new fault injection framework (HADOOP-6003). 
> I've been injecting unchecked exception (RuntimeException) into BlockReceiver.receivePacket() method before any
>   of write() operations (e.g. line 401, 449, 463, 529) and running some of the existing HDFS tests. The injection of unchecked exceptions causes DataXceiver to die silently and without any traces.
> From a debugger run it seems like some threads are being left alive or not notified about the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6073) Unchecked exception thrown inside of BlockReceiver cause some threads hang

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720923#action_12720923 ] 

Raghu Angadi commented on HADOOP-6073:
--------------------------------------


BlockReceiver.receiveBlock() does not clean up properly in case of runtime exception. It should interrupt the responder inside finally clause rather than just when IOException is caught. DN functions normally for other IO requests.

> Unchecked exception thrown inside of BlockReceiver cause some threads hang
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-6073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6073
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Konstantin Boudnik
>         Attachments: copy.txt.log, x2
>
>
> One is able to inject all sorts of faults into Hadoop's classes using new fault injection framework (HADOOP-6003). 
> I've been injecting unchecked exception (RuntimeException) into BlockReceiver.receivePacket() method before any
>   of write() operations (e.g. line 401, 449, 463, 529) and running some of the existing HDFS tests. The injection of unchecked exceptions causes DataXceiver to die silently and without any traces.
> From a debugger run it seems like some threads are being left alive or not notified about the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.