You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2009/05/21 22:53:45 UTC

[jira] Commented: (HADOOP-5886) Error when 2 of 3 DataNodes are full: "Could only be replicated to 0 nodes, instead of 1"

    [ https://issues.apache.org/jira/browse/HADOOP-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711796#action_12711796 ] 

Raghu Angadi commented on HADOOP-5886:
--------------------------------------

from core-user thread at http://www.nabble.com/Could-only-be-replicated-to-0-nodes%2C-instead-of-1-td23650042.html :

Most likely this is what is happening :

  * two out of 3 dns can not take anymore blocks.
  * While picking nodes for a new block, NN mostly skips the third dn as
well since '# active writes' on it is larger than '2 * avg'.
  * Even if there is one other block is being written on the 3rd, it is
still greater than (2 * 1/3).

To test this, if you write just one block to an idle cluster it should
succeed.
[...]

This particular problem is not that severe on a large cluster but HDFS
should do the sensible thing.

Raghu.


> Error when 2 of 3 DataNodes are full: "Could only be replicated to 0 nodes, instead of 1"
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5886
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5886
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.3
>         Environment: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> * Two clients are copying files all the time (one of them is the 1.5GB machine)
> * The replication is set on 2
>            Reporter: Stas Oskin
>
> I let the space on 2 smaller machines to end, to test the behavior.
> Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below:
> 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1
> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1
>             at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123)
>             at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
>             at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>  
>             at org.apache.hadoop.ipc.Client.call(Client.java:716)
>             at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>             at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>             at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>             at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>             at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>             at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450)
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333)
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)
>  
> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
> java.io.IOException: Could not get block locations. Aborting...
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
>             at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.