You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/09/24 18:59:50 UTC

[jira] Created: (HADOOP-1938) NameNode.create failed

NameNode.create failed
-----------------------

Key: HADOOP-1938
URL: https://issues.apache.org/jira/browse/HADOOP-1938
Project: Hadoop
Issue Type: Bug
Components: dfs
Affects Versions: 0.13.1
Reporter: Runping Qi

Under heavy load, DFS namenode fails to create file

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Failed to create file /xxx/xxx/_task_0001_r_000001_0/part-00001 on client xxx.xxx.xxx.xxx because there were not enough datanodes available. Found 0 datanodes but MIN_REPLICATION for the cluster is configured to be 1.
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:651)
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:341)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)

The above problem occurred when I ran a well tuned map/reduce program on a hood node cluster.
The program is well tuned in the sense that the map output data are evenly partitioned among 180 reducers.
The shuffling and sorting was completed at about the same time on all the reducers.
The reducers started reduce work at about the same time and were expected to produce about the same amount of output (2GB).
This "synchronized" behavior caused the reducers to try to create output dfs files at about the same time.
The namenode seemed to have difficulty to handle that situation, causing the reducers waiting on file creation for long period of time.
Eeventually, they failed with the above exception.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1938) NameNode.create failed

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530133 ] 

Koji Noguchi commented on HADOOP-1938:
--------------------------------------

bq.  Found 0 datanodes but MIN_REPLICATION for the cluster is configured to be 1.

I've seen this exception come up when the cluster was (semi-) full.  
If this is the case, maybe better error messages would help?




> NameNode.create failed 
> -----------------------
>
>                 Key: HADOOP-1938
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1938
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Runping Qi
>
> Under heavy load, DFS namenode fails to create file
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Failed to create file /xxx/xxx/_task_0001_r_000001_0/part-00001 on client xxx.xxx.xxx.xxx because there were not enough datanodes available. Found 0 datanodes but MIN_REPLICATION for the cluster is configured to be 1.
> 	at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:651)
> 	at org.apache.hadoop.dfs.NameNode.create(NameNode.java:294)
> 	at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:341)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:573)
> The above problem occurred when I ran a well tuned map/reduce program on a hood node cluster.
> The program is well tuned in the sense that the map output data are evenly partitioned among 180 reducers.
> The shuffling and sorting was completed at about the same time on all the reducers.
> The reducers started reduce work at about the same time and were expected to produce about the same amount of output (2GB).
> This "synchronized" behavior caused  the reducers to try to create output dfs files at about the same time.
> The namenode seemed to have difficulty to handle that situation, causing the reducers waiting on file creation for long period of time.
> Eeventually, they failed with the above exception.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.