You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/05/22 02:31:45 UTC

[jira] Created: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Use exponential backoff on Thread.sleep during DN shutdown
----------------------------------------------------------

                 Key: HADOOP-5890
                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


Tests waste a lot of time in DataNode.shutdown. Typical logs look like:

{code}
2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
{code}

In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.

Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-5890:
--------------------------------

    Attachment: hadoop-5890.txt

Attached patch starts off sleeping 2ms and then exponentially decays by a factor of 1.5x each time. Feel free to twiddle the ratio before committing if you so desire.

No additional tests since this is trivial and in the code path for every MiniMR test.

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712195#action_12712195 ] 

Raghu Angadi commented on HADOOP-5890:
--------------------------------------

Looking at trunk, there should not have been any wait since 'active threads is 0', right?

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712198#action_12712198 ] 

Raghu Angadi commented on HADOOP-5890:
--------------------------------------

If this is from a junit test, the two logs are likely from different datanodes.

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712251#action_12712251 ] 

Steve Loughran commented on HADOOP-5890:
----------------------------------------

+1 to anything that boosts setup/teardown speed

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712654#action_12712654 ] 

Hadoop QA commented on HADOOP-5890:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408764/hadoop-5890.txt
  against trunk revision 778182.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/395/console

This message is automatically generated.

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HADOOP-5890:
--------------------------------

    Status: Patch Available  (was: Open)

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran updated HADOOP-5890:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
           Status: Resolved  (was: Patch Available)

Committed! Thanks Todd. 

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.21.0
>
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712205#action_12712205 ] 

Todd Lipcon commented on HADOOP-5890:
-------------------------------------

Woops, I pasted a bad example from the log... here's an example that actually demonstrates the behavior discussed:

{code}
2009-05-21 22:43:21,259 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 1
2009-05-21 22:43:21,259 WARN  datanode.DataNode (DataXceiverServer.java:run(137)) - DatanodeRegistration(127.0.0.1:40197, storageID=DS-2052133204-127.0.1.1-40197-1242971000238, infoPort=52207, ipcPort=52592):DataXceiveServer: java.nio.channels.AsynchronousCloseException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
        at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
        at java.lang.Thread.run(Thread.java:619)

2009-05-21 22:43:21,315 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
2009-05-21 22:43:22,259 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
{code}

Note the exact 1second offset between 22:43:21,259  and 22:43:22,259. This patch reduces that significantly.

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5890) Use exponential backoff on Thread.sleep during DN shutdown

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712637#action_12712637 ] 

dhruba borthakur commented on HADOOP-5890:
------------------------------------------

+1. Code looks good. 

> Use exponential backoff on Thread.sleep during DN shutdown
> ----------------------------------------------------------
>
>                 Key: HADOOP-5890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5890
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-5890.txt
>
>
> Tests waste a lot of time in DataNode.shutdown. Typical logs look like:
> {code}
> 2009-05-21 17:13:20,177 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> 2009-05-21 17:13:20,177 INFO  datanode.DataBlockScanner (DataBlockScanner.java:run(620)) - Exiting DataBlockScanner thread.
> 2009-05-21 17:13:21,117 INFO  datanode.DataNode (DataNode.java:shutdown(637)) - Waiting for threadgroup to exit, active threads is 0
> {code}
> In this example (and very commonly) the DataBlockScanner thread exits within 5-10ms after the first wait. The DN then sleeps an entire second before succeeding in shutting down.
> Using exponential backoff from a short value like 2ms up to a maximum of 1000ms would solve this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.