You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2008/08/20 23:46:44 UTC
[jira] Commented: (HADOOP-1938) NameNode.create failed

    [ https://issues.apache.org/jira/browse/HADOOP-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624152#action_12624152 ] 

Hairong Kuang commented on HADOOP-1938:
---------------------------------------

The error occurs in an almost full cluster. Block placement needs to traverse the network topology to find good candidates. When a cluster is almost full, it needs to traverse much more number of datanodes before it declares failure. So it fails very slowly and wastes a lot of CPU time. 

What can be done is to remove a datanode from the network topology when it becomes full and then add it back to the network topology when it has some space available. So when a cluster becomes almost full, the network topology has much less number of datanodes. So it will be fast to place a block.

> NameNode.create failed 
> -----------------------
>
>                 Key: HADOOP-1938
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1938
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Runping Qi
>
> Under heavy load, DFS namenode fails to create file
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Failed to create file /xxx/xxx/_task_0001_r_000001_0/part-00001 on client xxx.xxx.xxx.xxx because there were not enough datanodes available. Found 0 datanodes but MIN_REPLICATION for the cluster is configured to be 1.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:651)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
> 	at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:341)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)
> The above problem occurred when I ran a well tuned map/reduce program on a hood node cluster.
> The program is well tuned in the sense that the map output data are evenly partitioned among 180 reducers.
> The shuffling and sorting was completed at about the same time on all the reducers.
> The reducers started reduce work at about the same time and were expected to produce about the same amount of output (2GB).
> This "synchronized" behavior caused  the reducers to try to create output dfs files at about the same time.
> The namenode seemed to have difficulty to handle that situation, causing the reducers waiting on file creation for long period of time.
> Eeventually, they failed with the above exception.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.