You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Oscar Gothberg <os...@gmail.com> on 2010/05/10 20:23:32 UTC

job executions fail with NotReplicatedYetException

Hi,

I keep having jobs fail at the very end, with 100% complete "map",
100% complete "reduce",
due to NotReplicatedYetException w.r.t the _temporary subdirectory of
the job output directory.

It doesn't happen 100% of the time, so it's not trivially
reproducible, but it happens enough
(10-20% of runs) to make it a real pain.

Any ideas, has anyone seen something similar? Part of the stack trace:

NotReplicatedYetException: Not replicated
yet:/test/out/dayperiod=14731/_temporary/_attempt_201005052338_0194_r_000001_0/part-00001
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1253)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
...

Thanks,
/ Oscar

Re: job executions fail with NotReplicatedYetException

Posted by Oscar Gothberg <os...@gmail.com>.
For anyone else out there seeing this problem, this was alleviated for
me by increasing the dfs.namenode.handler.count and
dfs.datanode.handler.count.

/ Oscar

On Mon, May 10, 2010 at 11:23 AM, Oscar Gothberg
<os...@gmail.com> wrote:
> Hi,
>
> I keep having jobs fail at the very end, with 100% complete "map",
> 100% complete "reduce",
> due to NotReplicatedYetException w.r.t the _temporary subdirectory of
> the job output directory.
>
> It doesn't happen 100% of the time, so it's not trivially
> reproducible, but it happens enough
> (10-20% of runs) to make it a real pain.
>
> Any ideas, has anyone seen something similar? Part of the stack trace:
>
> NotReplicatedYetException: Not replicated
> yet:/test/out/dayperiod=14731/_temporary/_attempt_201005052338_0194_r_000001_0/part-00001
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1253)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> ...
>
> Thanks,
> / Oscar
>