You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Derek Wollenstein (JIRA)" <ji...@apache.org> on 2009/03/26 17:50:03 UTC

[jira] Created: (HADOOP-5584) DFSClient continues to retry indefinitely

DFSClient continues to retry indefinitely
-----------------------------------------

                 Key: HADOOP-5584
                 URL: https://issues.apache.org/jira/browse/HADOOP-5584
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.19.1
            Reporter: Derek Wollenstein
            Priority: Minor


I encountered a bug when trying to upload data using the Hadoop DFS Client.  
After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times.  In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative:
2009-03-25 16:20:02 [INFO] 
2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds
2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
0325_us/logs_20090325_us_13 retries left -1


The stack trace for the failure that's retrying is:
2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated
YetException: Not replicated yet:<filename>
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
2009-03-25 16:20:02 [INFO]      at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
2009-03-25 16:20:02 [INFO]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2009-03-25 16:20:02 [INFO]      at java.lang.reflect.Method.invoke(Method.java:597)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-03-25 16:20:02 [INFO] 
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.Client.call(Client.java:697)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
2009-03-25 16:20:02 [INFO]      at $Proxy0.addBlock(Unknown Source)
2009-03-25 16:20:02 [INFO]      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
2009-03-25 16:20:02 [INFO]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2009-03-25 16:20:02 [INFO]      at java.lang.reflect.Method.invoke(Method.java:597)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
2009-03-25 16:20:02 [INFO]      at $Proxy0.addBlock(Unknown Source)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5584) DFSClient continues to retry indefinitely

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695827#action_12695827 ] 

dhruba borthakur commented on HADOOP-5584:
------------------------------------------

The are potions of the client code that retries indefinitely. Especially when the file is closed. II agree completely that this portion of the client code should be changed to retry  for a large finite amount of time.

> DFSClient continues to retry indefinitely
> -----------------------------------------
>
>                 Key: HADOOP-5584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Derek Wollenstein
>            Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times.  In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:<filename>
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]      at java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]      at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]      at java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]      at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5584) DFSClient continues to retry indefinitely

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689566#action_12689566 ] 

Hairong Kuang commented on HADOOP-5584:
---------------------------------------

Although the negative retry number is misleading, it seems to me retrying forever when allocating a new block is a policy when a previous block in a file being created does not meet the minimum replication factor. DFS does the same for close. Close retries forever if any of blocks in the file does not meet the minimum replication factor. Retrying forever may not be a good policy. A dfs client should retry a finite times and then declare a failure to the client.

> DFSClient continues to retry indefinitely
> -----------------------------------------
>
>                 Key: HADOOP-5584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.1
>            Reporter: Derek Wollenstein
>            Priority: Minor
>
> I encountered a bug when trying to upload data using the Hadoop DFS Client.  
> After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times.  In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative:
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds
> 2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
> 0325_us/logs_20090325_us_13 retries left -1
> The stack trace for the failure that's retrying is:
> 2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated
> YetException: Not replicated yet:<filename>
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]      at java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
> 2009-03-25 16:20:02 [INFO] 
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.Client.call(Client.java:697)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> 2009-03-25 16:20:02 [INFO]      at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 2009-03-25 16:20:02 [INFO]      at java.lang.reflect.Method.invoke(Method.java:597)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 2009-03-25 16:20:02 [INFO]      at $Proxy0.addBlock(Unknown Source)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
> 2009-03-25 16:20:02 [INFO]      at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.