You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Stream Click <hd...@gmail.com> on 2010/01/09 02:17:57 UTC

hadoop NotReplicatedYetException

Hi HDFS and Hadoop users,

We run different Jobs continuously in hadoop fleet 24/7. The job loading
data to HDFS is getting following NotReplicatedYetException occasionally
when the taskworker tries to write data to HDFS blocks. However, most loader
jobs can finish without problem. We can't figure out why this is happening.
We read the code around the error but it seems providing not too much clue
it didn’t help.  The only possible related issue is file deletion. Since we
run some many jobs, we need to delete intermediate files every hour. And the
exception is usually thrown when the hourly deletion job is running.  Again,
we only see the exception occasionally. Most loader jobs finish without any
problem together with deletion job

We looked at the log for these failure tasks. It has this error message:
"Task#task_200912312116_3288_m_000000 message: Task
attempt_200912312116_3288_m_000000_0 failed to report status for 240
seconds. Killing!",  (we set timeout at 4 minutes)

We assume that there is racing situation between writing task and deletion
task and somehow writing task is blocked and eventually time out. We
increase the sleep time between deleting each folder but it didn't fix the
problem.

Has anyone seen this problem before? Is there any way to fix it?

Thanks
Stream

"Task#task_200912312116_3288_m_000000 message:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/output/datat/2010-01-01T19-10Z/file.json
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1253)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
2010-01-01 11:49:15,651 (INFO) -     at
sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
2010-01-01 11:49:15,651 (INFO) -     at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2010-01-01 11:49:15,651 (INFO) -     at
java.lang.reflect.Method.invoke(Method.java:597)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
2010-01-01 11:49:15,651 (INFO) -     at
java.security.AccessController.doPrivileged(Native Method)
2010-01-01 11:49:15,651 (INFO) -     at
javax.security.auth.Subject.doAs(Subject.java:396)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.ipc.Client.call(Client.java:739)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
2010-01-01 11:49:15,651 (INFO) -     at $Proxy1.addBlock(Unknown Source)
2010-01-01 11:49:15,651 (INFO) -     at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2010-01-01 11:49:15,651 (INFO) -     at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
2010-01-01 11:49:15,651 (INFO) -     at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2010-01-01 11:49:15,651 (INFO) -     at
java.lang.reflect.Method.invoke(Method.java:597)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
2010-01-01 11:49:15,651 (INFO) -     at $Proxy1.addBlock(Unknown Source)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2904)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2786)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
2010-01-01 11:49:15,651 (INFO) -     at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)"