You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/03/23 17:46:32 UTC

[jira] Created: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

DataNode and FSNamesystem don't shutdown cleanly
------------------------------------------------

                 Key: HADOOP-1153
                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.12.1
            Reporter: Nigel Daley
             Fix For: 0.13.0


The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484818 ] 

Hadoop QA commented on HADOOP-1153:
-----------------------------------

Integrated in Hadoop-Nightly #39 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/39/)

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch, Interrupts.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by Nigel Daley <nd...@yahoo-inc.com>.
> Konstantin Shvachko commented on HADOOP-1153:
> ---------------------------------------------
>
> DataXceiveServer should either declare
>         boolean shouldListen = true;
> as volatile or use DataNode.shouldRun instead

Yup.

> ==========
> DataNode.register() should loop on
>       while( shouldRun ) {
> instead of
>       while( true ) {

Yup.

> ==========
> The DataNode thread itself is interrupted in shutdownAll(), but we  
> never call it.
> Who is interrupting the main data-node thread?

shutdownAll and other static methods on DataNode are called by  
MiniDFSCluster.  They need to be reworked once 1085 is committed.

> ==========
> Even if it is interrupted the RPC will ignore this inrrupt
> RPC.waitForProxy()
>     while (true) {
>       try {
> .................
>       } catch (InterruptedException ie) {
>         // IGNORE
>       }
>     }
> May be this is one of the main problems with all our Mini clusters?

This could involve a much larger change.  I haven't seen this wait as  
a problem in practice.  Perhaps the method should declare that it  
throws InterruptedException.  I'm not making this change part of this  
patch.

> ==========
> DataNode.runAndWait() calls join() and catches InterruptedException
>       try {
>         t.join();
>       } catch (InterruptedException e) {
>         if (Thread.currentThread().isInterrupted()) {
>           // did someone knock?
>           return;
>         }
>       }
> Here is what documentation on join says:
> void java.lang.Thread.join()
>
> Waits for this thread to die.
>
> Throws: InterruptedException if another thread has interrupted the  
> current thread.
> The interrupted status of the current thread is cleared when this  
> exception is thrown.
>
> Does it make any sense to check isInterrupted()?

This code has been there a long time...it makes no sense so I'll  
remove it.

>> DataNode and FSNamesystem don't shutdown cleanly
>> ------------------------------------------------
>>
>>                 Key: HADOOP-1153
>>                 URL: https://issues.apache.org/jira/browse/ 
>> HADOOP-1153
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: dfs
>>    Affects Versions: 0.12.1
>>            Reporter: Nigel Daley
>>             Fix For: 0.13.0
>>
>>         Attachments: 1153.patch
>>
>>
>> The DataNode and FSNamesystem don't interrup their threads when  
>> shutting down.  This causes threads to stay around which is a  
>> problem if tests are starting and stopping these servers many  
>> times in the same process.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12483788 ] 

Konstantin Shvachko commented on HADOOP-1153:
---------------------------------------------

DataXceiveServer should either declare
        boolean shouldListen = true;
as volatile or use DataNode.shouldRun instead

==========
DataNode.register() should loop on
      while( shouldRun ) {
instead of
      while( true ) {

==========
The DataNode thread itself is interrupted in shutdownAll(), but we never call it.
Who is interrupting the main data-node thread?

==========
Even if it is interrupted the RPC will ignore this inrrupt
RPC.waitForProxy()
    while (true) {
      try {
.................
      } catch (InterruptedException ie) {
        // IGNORE
      }
    }
May be this is one of the main problems with all our Mini clusters?

==========
DataNode.runAndWait() calls join() and catches InterruptedException
      try {
        t.join();
      } catch (InterruptedException e) {
        if (Thread.currentThread().isInterrupted()) {
          // did someone knock?
          return;
        }
      }
Here is what documentation on join says:
void java.lang.Thread.join()

Waits for this thread to die.

Throws: InterruptedException if another thread has interrupted the current thread.
The interrupted status of the current thread is cleared when this exception is thrown.

Does it make any sense to check isInterrupted()?

==========
The NameNode should be also checked that it
- closes all files
- closes all soccets
- correctly handles InterruptedException



> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-1153:
--------------------------------

    Attachment: 1153.patch

A patch for review.  This passes all unit tests.

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484522 ] 

Hadoop QA commented on HADOOP-1153:
-----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12354285/Interrupts.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/522597. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch, Interrupts.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko reassigned HADOOP-1153:
-------------------------------------------

    Assignee: Konstantin Shvachko

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1153:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Konstantin!

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch, Interrupts.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1153:
----------------------------------------

    Status: Patch Available  (was: Open)

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch, Interrupts.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484512 ] 

dhruba borthakur commented on HADOOP-1153:
------------------------------------------

+1 Code reviewed.


> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch, Interrupts.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1153) DataNode and FSNamesystem don't shutdown cleanly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1153:
----------------------------------------

    Attachment: Interrupts.patch

Fixed problems mentioned above except for the one with RPC.waitForProxy()
I decided not to make changes in RPC, since this would be a rather big change, and also
since the problem is not as bad as I initially thought.
waitForProxy is called  in constructors and shutdown cannot be called until the object is instantiated.
Also checked the name-node code.
All unit tests pass, and I see some improvement in the execution time.
Please review.

> DataNode and FSNamesystem don't shutdown cleanly
> ------------------------------------------------
>
>                 Key: HADOOP-1153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1153
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.1
>            Reporter: Nigel Daley
>         Assigned To: Konstantin Shvachko
>             Fix For: 0.13.0
>
>         Attachments: 1153.patch, Interrupts.patch
>
>
> The DataNode and FSNamesystem don't interrup their threads when shutting down.  This causes threads to stay around which is a problem if tests are starting and stopping these servers many times in the same process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.