You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Michael Lok <fu...@gmail.com> on 2012/01/10 01:18:06 UTC

NotReplicatedYetException error

Hi folks,

Not sure if this is related to Pig or Hadoop in general; but I'm
posting this here since I'm running Pig scripts :)

Anyway, I've been trying to perform a CROSS join between 2 files which
results in ~1 billion records.  My Hadoop cluster has 4 data nodes.
The namenode also serves as one of the data nodes as well (not
recommended, but haven't had time to reconfigure this yet :P).  After
executing the Pig script, it threw the following exception at around
80+%:

java.io.IOException: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
replicated yet:/user/root/out/_tempora
ry/_attempt_201201091651_0001_r_000001_3/part-r-00001
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1517)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

Pig script shown below:

============================================================
set job.name 'vac cross 2';
set default_parallel 10;

register lib/*.jar;

define DIST com.pig.udf.Distance();

js = load 'js.csv' using PigStorage(',') as (ic:chararray, jsstate:chararray);

vac = load 'vac.csv' using PigStorage(',') as (id:chararray,
vacstate:chararray);

cx = cross js, vac;

d = foreach cx generate ic, jsstate, id, vacstate, DIST(jsstate, vacstate);

store d into 'out' using PigStorage(',');
============================================================

Any help is greatly appreciated.


Thanks!

Re: NotReplicatedYetException error

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

It's a Hadoop error, and a transient one at that (usually). Hadoop
tends to keep chugging when this happens, unless it keeps on
happening.

On Mon, Jan 9, 2012 at 4:31 PM, Daniel Dai <da...@hortonworks.com> wrote:
> This is more like a hadoop issue. Check the dfs UI to see if data nodes are
> up.
>
> On Mon, Jan 9, 2012 at 4:18 PM, Michael Lok <fu...@gmail.com> wrote:
>
>> Hi folks,
>>
>> Not sure if this is related to Pig or Hadoop in general; but I'm
>> posting this here since I'm running Pig scripts :)
>>
>> Anyway, I've been trying to perform a CROSS join between 2 files which
>> results in ~1 billion records.  My Hadoop cluster has 4 data nodes.
>> The namenode also serves as one of the data nodes as well (not
>> recommended, but haven't had time to reconfigure this yet :P).  After
>> executing the Pig script, it threw the following exception at around
>> 80+%:
>>
>> java.io.IOException: org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
>> replicated yet:/user/root/out/_tempora
>> ry/_attempt_201201091651_0001_r_000001_3/part-r-00001
>>        at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1517)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
>>        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>>        at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>> Pig script shown below:
>>
>> ============================================================
>> set job.name 'vac cross 2';
>> set default_parallel 10;
>>
>> register lib/*.jar;
>>
>> define DIST com.pig.udf.Distance();
>>
>> js = load 'js.csv' using PigStorage(',') as (ic:chararray,
>> jsstate:chararray);
>>
>> vac = load 'vac.csv' using PigStorage(',') as (id:chararray,
>> vacstate:chararray);
>>
>> cx = cross js, vac;
>>
>> d = foreach cx generate ic, jsstate, id, vacstate, DIST(jsstate, vacstate);
>>
>> store d into 'out' using PigStorage(',');
>> ============================================================
>>
>> Any help is greatly appreciated.
>>
>>
>> Thanks!
>>

Re: NotReplicatedYetException error

Posted by Daniel Dai <da...@hortonworks.com>.

This is more like a hadoop issue. Check the dfs UI to see if data nodes are
up.

On Mon, Jan 9, 2012 at 4:18 PM, Michael Lok <fu...@gmail.com> wrote:

> Hi folks,
>
> Not sure if this is related to Pig or Hadoop in general; but I'm
> posting this here since I'm running Pig scripts :)
>
> Anyway, I've been trying to perform a CROSS join between 2 files which
> results in ~1 billion records.  My Hadoop cluster has 4 data nodes.
> The namenode also serves as one of the data nodes as well (not
> recommended, but haven't had time to reconfigure this yet :P).  After
> executing the Pig script, it threw the following exception at around
> 80+%:
>
> java.io.IOException: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/user/root/out/_tempora
> ry/_attempt_201201091651_0001_r_000001_3/part-r-00001
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1517)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685)
>        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:427)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:399)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:261)
>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>        at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> Pig script shown below:
>
> ============================================================
> set job.name 'vac cross 2';
> set default_parallel 10;
>
> register lib/*.jar;
>
> define DIST com.pig.udf.Distance();
>
> js = load 'js.csv' using PigStorage(',') as (ic:chararray,
> jsstate:chararray);
>
> vac = load 'vac.csv' using PigStorage(',') as (id:chararray,
> vacstate:chararray);
>
> cx = cross js, vac;
>
> d = foreach cx generate ic, jsstate, id, vacstate, DIST(jsstate, vacstate);
>
> store d into 'out' using PigStorage(',');
> ============================================================
>
> Any help is greatly appreciated.
>
>
> Thanks!
>