You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Stas Oskin (JIRA)" <ji...@apache.org> on 2009/05/21 22:39:45 UTC
[jira] Created: (HADOOP-5886) Could only be replicated to 0 nodes,
instead of 1 when 2 of 3 DataNodes are full
Could only be replicated to 0 nodes, instead of 1 when 2 of 3 DataNodes are full
--------------------------------------------------------------------------------
Key: HADOOP-5886
URL: https://issues.apache.org/jira/browse/HADOOP-5886
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.18.3
Environment: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
* Two clients are copying files all the time (one of them is the 1.5GB machine)
* The replication is set on 2
Reporter: Stas Oskin
I let the space on 2 smaller machines to end, to test the behavior.
Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below:
10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1
09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
at org.apache.hadoop.ipc.Client.call(Client.java:716)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)
09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
java.io.IOException: Could not get block locations. Aborting...
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5886) Error when 2 of 3 DataNodes are
full: "Could only be replicated to 0 nodes, instead of 1"
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711796#action_12711796 ]
Raghu Angadi commented on HADOOP-5886:
--------------------------------------
from core-user thread at http://www.nabble.com/Could-only-be-replicated-to-0-nodes%2C-instead-of-1-td23650042.html :
Most likely this is what is happening :
* two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dn as
well since '# active writes' on it is larger than '2 * avg'.
* Even if there is one other block is being written on the 3rd, it is
still greater than (2 * 1/3).
To test this, if you write just one block to an idle cluster it should
succeed.
[...]
This particular problem is not that severe on a large cluster but HDFS
should do the sensible thing.
Raghu.
> Error when 2 of 3 DataNodes are full: "Could only be replicated to 0 nodes, instead of 1"
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-5886
> URL: https://issues.apache.org/jira/browse/HADOOP-5886
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.3
> Environment: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> * Two clients are copying files all the time (one of them is the 1.5GB machine)
> * The replication is set on 2
> Reporter: Stas Oskin
>
> I let the space on 2 smaller machines to end, to test the behavior.
> Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below:
> 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1
> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1
> at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:716)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)
>
> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
> java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5886) Error when 2 of 3 DataNodes are full:
"Could only be replicated to 0 nodes, instead of 1"
Posted by "Stas Oskin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stas Oskin updated HADOOP-5886:
-------------------------------
Summary: Error when 2 of 3 DataNodes are full: "Could only be replicated to 0 nodes, instead of 1" (was: Could only be replicated to 0 nodes, instead of 1 when 2 of 3 DataNodes are full)
> Error when 2 of 3 DataNodes are full: "Could only be replicated to 0 nodes, instead of 1"
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-5886
> URL: https://issues.apache.org/jira/browse/HADOOP-5886
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.3
> Environment: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> * Two clients are copying files all the time (one of them is the 1.5GB machine)
> * The replication is set on 2
> Reporter: Stas Oskin
>
> I let the space on 2 smaller machines to end, to test the behavior.
> Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below:
> 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1
> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1
> at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:716)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)
>
> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
> java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5886) Error when 2 of 3 DataNodes are
full: "Could only be replicated to 0 nodes, instead of 1"
Posted by "Stas Oskin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715067#action_12715067 ]
Stas Oskin commented on HADOOP-5886:
------------------------------------
Any idea if the fixes for this will go into the latest trunk?
Will it be back-portable to 0.18.3?
Regards.
> Error when 2 of 3 DataNodes are full: "Could only be replicated to 0 nodes, instead of 1"
> -----------------------------------------------------------------------------------------
>
> Key: HADOOP-5886
> URL: https://issues.apache.org/jira/browse/HADOOP-5886
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.3
> Environment: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB
> * Two clients are copying files all the time (one of them is the 1.5GB machine)
> * The replication is set on 2
> Reporter: Stas Oskin
>
> I let the space on 2 smaller machines to end, to test the behavior.
> Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below:
> 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1
> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1
> at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:716)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922)
>
> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
> java.io.IOException: Could not get block locations. Aborting...
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745)
> at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.