You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Mix Nin <pi...@gmail.com> on 2013/03/27 22:58:18 UTC

Commands not working properly when stored in pig file

Sorry for posting same issue multiple times

I  wrote a pig script as follows and stored it in x.pig file

Data = LOAD '/....' as (,,,, )
NoNullData= FILTER Data by qe is not null;
STORE (foreach (group NoNullData all) generate flatten($1))  into
'exp/$inputDatePig';


evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)



I executed the command as follows

pig  -f x.pig -param inputDatePig=03272013


And  finally it says exp/03272013 tough the directory exists as it gets
created in STORE command.

What is wrong in this


This is the error I get

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 32% complete
2013-03-27 14:38:35,568 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2013-03-27 14:38:45,731 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job null has failed! Stop running all dependent jobs
2013-03-27 14:38:45,731 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2013-03-27 14:38:45,734 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backend error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
path does not exist: hdfs://user/lnindrakrishna/exp/03272013/part-r-00000



But when I remove second LOAD command, everything runs fine . Why does it
throw job "null has failed! Stop running all dependent jobs"

Re: Commands not working properly when stored in pig file

Posted by Johnny Zhang <xi...@cloudera.com>.

Thanks for the explanation, Marcos!


On Thu, Mar 28, 2013 at 3:08 AM, MARCOS MEDRADO RUBINELLI <
marcosm@buscapecompany.com> wrote:

>
> Hi, Mix:
> " second map reduce started executing before first one got completed"
> Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it,
> Pig shouldn't do anything, especially before STORE command complete.
>
> I have below script and it works fine. So think root cause is something
> else. Unless your data is very big?
> a = load 'words_and_numbers' as (f1:chararray, f2:chararray);
> b = filter a by f1 is not null;
> store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp';
> c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray);
> dump c;
>
> Johnny
>
>
>
> It's the multi-query execution optimization. Pig doesn't know it should
> wait for the STORE before the second LOAD, so it tries to run it in
> parallel. You have three options:
>
> 1. Name the relation you stored and use it instead of loading a new
> relation:
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> exp = foreach (group NoNullData all) generate flatten($1);
> STORE exp  into 'exp/$inputDatePig';
>
> evnt_dtl = FOREACH exp GENERATE $0 as cust ...
>
> 2. Use the EXEC keyword to tell Pig to finish the commands up to that
> point before running the rest:
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> STORE (foreach (group NoNullData all) generate flatten($1))  into
> 'exp/$inputDatePig';
> EXEC;
> evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
>
> 3. Disable multi-query execution:
> $ pig -no_multiquery x.pig
>
>
> - Marcos
>

Re: Commands not working properly when stored in pig file

Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.

Hi, Mix:
" second map reduce started executing before first one got completed"
Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it,
Pig shouldn't do anything, especially before STORE command complete.

I have below script and it works fine. So think root cause is something
else. Unless your data is very big?
a = load 'words_and_numbers' as (f1:chararray, f2:chararray);
b = filter a by f1 is not null;
store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp';
c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray);
dump c;

Johnny



It's the multi-query execution optimization. Pig doesn't know it should wait for the STORE before the second LOAD, so it tries to run it in parallel. You have three options:

1. Name the relation you stored and use it instead of loading a new relation:

Data = LOAD '/....' as (,,,, )
NoNullData= FILTER Data by qe is not null;
exp = foreach (group NoNullData all) generate flatten($1);
STORE exp  into 'exp/$inputDatePig';

evnt_dtl = FOREACH exp GENERATE $0 as cust ...

2. Use the EXEC keyword to tell Pig to finish the commands up to that point before running the rest:

Data = LOAD '/....' as (,,,, )
NoNullData= FILTER Data by qe is not null;
STORE (foreach (group NoNullData all) generate flatten($1))  into
'exp/$inputDatePig';
EXEC;
evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)

3. Disable multi-query execution:
$ pig -no_multiquery x.pig


- Marcos

Re: Commands not working properly when stored in pig file

Posted by Johnny Zhang <xi...@cloudera.com>.

Hi, Mix:
" second map reduce started executing before first one got completed"
Interesting. Since you just do LOAD for evnt_dtl, without DUMP or STORE it,
Pig shouldn't do anything, especially before STORE command complete.

I have below script and it works fine. So think root cause is something
else. Unless your data is very big?
a = load 'words_and_numbers' as (f1:chararray, f2:chararray);
b = filter a by f1 is not null;
store (foreach (group b all) generate flatten($1)) into 'multipleload/tmp';
c = load 'multipleload/tmp/part-r-00000' as (f3:chararray, f4:chararray);
dump c;

Johnny


On Wed, Mar 27, 2013 at 4:07 PM, Mix Nin <pi...@gmail.com> wrote:

> I guess the second map reduce started executing before first one got
> completed.  Below is error log
>
> 2013-03-27 15:48:08,902 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - creating jar file Job4695026384513564120.jar
> 2013-03-27 15:48:13,983 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - jar file Job4695026384513564120.jar created
> 2013-03-27 15:48:13,993 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2013-03-27 15:48:14,052 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 2 map-reduce job(s) waiting for submission.
>
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> N/A     1-18,1-19,FACT_PXP_EVNT_DTL,evnt_dtl    GROUP_BY        Message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
> path does not exist: hdfs:///user/lnindrakrishna/exp/part-r-00000
>
> When I run the scripts individually in grunt shell one by one, i don't see
> this problem
>
>
> On Wed, Mar 27, 2013 at 3:45 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > yes the file exists in HDFS.
> >
> >
> > On Wed, Mar 27, 2013 at 3:16 PM, Johnny Zhang <xiaoyuz@cloudera.com
> >wrote:
> >
> >> Mix,
> >> 'null' is the failed job ID. From what I can tell, there is only one
> STORE
> >> command and it actually fail, so MapReduceLauncher tries to stop
> >> all dependent jobs, that's why the message is trhown. Can you double
> check
> >> if the file exists in HDFS?
> >>
> >> Johnny
> >>
> >>
> >> On Wed, Mar 27, 2013 at 2:58 PM, Mix Nin <pi...@gmail.com> wrote:
> >>
> >> > Sorry for posting same issue multiple times
> >> >
> >> > I  wrote a pig script as follows and stored it in x.pig file
> >> >
> >> > Data = LOAD '/....' as (,,,, )
> >> > NoNullData= FILTER Data by qe is not null;
> >> > STORE (foreach (group NoNullData all) generate flatten($1))  into
> >> > 'exp/$inputDatePig';
> >> >
> >> >
> >> > evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
> >> >
> >> >
> >> >
> >> > I executed the command as follows
> >> >
> >> > pig  -f x.pig -param inputDatePig=03272013
> >> >
> >> >
> >> > And  finally it says exp/03272013 tough the directory exists as it
> gets
> >> > created in STORE command.
> >> >
> >> > What is wrong in this
> >> >
> >> >
> >> > This is the error I get
> >> >
> >> >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 32% complete
> >> > 2013-03-27 14:38:35,568 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 50% complete
> >> > 2013-03-27 14:38:45,731 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - job null has failed! Stop running all dependent jobs
> >> > 2013-03-27 14:38:45,731 [main] INFO
> >> >
> >> >
> >>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> >> > - 100% complete
> >> > 2013-03-27 14:38:45,734 [main] ERROR
> >> > org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
> >> > recreate exception from backend error:
> >> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
> Input
> >> > path does not exist:
> >> hdfs://user/lnindrakrishna/exp/03272013/part-r-00000
> >> >
> >> >
> >> >
> >> > But when I remove second LOAD command, everything runs fine . Why does
> >> it
> >> > throw job "null has failed! Stop running all dependent jobs"
> >> >
> >>
> >
> >
>

Re: Commands not working properly when stored in pig file

Posted by Mix Nin <pi...@gmail.com>.

I guess the second map reduce started executing before first one got
completed.  Below is error log

2013-03-27 15:48:08,902 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job4695026384513564120.jar
2013-03-27 15:48:13,983 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job4695026384513564120.jar created
2013-03-27 15:48:13,993 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2013-03-27 15:48:14,052 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 2 map-reduce job(s) waiting for submission.

Failed Jobs:
JobId   Alias   Feature Message Outputs
N/A     1-18,1-19,FACT_PXP_EVNT_DTL,evnt_dtl    GROUP_BY        Message:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
path does not exist: hdfs:///user/lnindrakrishna/exp/part-r-00000

When I run the scripts individually in grunt shell one by one, i don't see
this problem


On Wed, Mar 27, 2013 at 3:45 PM, Mix Nin <pi...@gmail.com> wrote:

> yes the file exists in HDFS.
>
>
> On Wed, Mar 27, 2013 at 3:16 PM, Johnny Zhang <xi...@cloudera.com>wrote:
>
>> Mix,
>> 'null' is the failed job ID. From what I can tell, there is only one STORE
>> command and it actually fail, so MapReduceLauncher tries to stop
>> all dependent jobs, that's why the message is trhown. Can you double check
>> if the file exists in HDFS?
>>
>> Johnny
>>
>>
>> On Wed, Mar 27, 2013 at 2:58 PM, Mix Nin <pi...@gmail.com> wrote:
>>
>> > Sorry for posting same issue multiple times
>> >
>> > I  wrote a pig script as follows and stored it in x.pig file
>> >
>> > Data = LOAD '/....' as (,,,, )
>> > NoNullData= FILTER Data by qe is not null;
>> > STORE (foreach (group NoNullData all) generate flatten($1))  into
>> > 'exp/$inputDatePig';
>> >
>> >
>> > evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
>> >
>> >
>> >
>> > I executed the command as follows
>> >
>> > pig  -f x.pig -param inputDatePig=03272013
>> >
>> >
>> > And  finally it says exp/03272013 tough the directory exists as it gets
>> > created in STORE command.
>> >
>> > What is wrong in this
>> >
>> >
>> > This is the error I get
>> >
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 32% complete
>> > 2013-03-27 14:38:35,568 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 50% complete
>> > 2013-03-27 14:38:45,731 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - job null has failed! Stop running all dependent jobs
>> > 2013-03-27 14:38:45,731 [main] INFO
>> >
>> >
>>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 100% complete
>> > 2013-03-27 14:38:45,734 [main] ERROR
>> > org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
>> > recreate exception from backend error:
>> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
>> > path does not exist:
>> hdfs://user/lnindrakrishna/exp/03272013/part-r-00000
>> >
>> >
>> >
>> > But when I remove second LOAD command, everything runs fine . Why does
>> it
>> > throw job "null has failed! Stop running all dependent jobs"
>> >
>>
>
>

Re: Commands not working properly when stored in pig file

Posted by Mix Nin <pi...@gmail.com>.

yes the file exists in HDFS.


On Wed, Mar 27, 2013 at 3:16 PM, Johnny Zhang <xi...@cloudera.com> wrote:

> Mix,
> 'null' is the failed job ID. From what I can tell, there is only one STORE
> command and it actually fail, so MapReduceLauncher tries to stop
> all dependent jobs, that's why the message is trhown. Can you double check
> if the file exists in HDFS?
>
> Johnny
>
>
> On Wed, Mar 27, 2013 at 2:58 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > Sorry for posting same issue multiple times
> >
> > I  wrote a pig script as follows and stored it in x.pig file
> >
> > Data = LOAD '/....' as (,,,, )
> > NoNullData= FILTER Data by qe is not null;
> > STORE (foreach (group NoNullData all) generate flatten($1))  into
> > 'exp/$inputDatePig';
> >
> >
> > evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
> >
> >
> >
> > I executed the command as follows
> >
> > pig  -f x.pig -param inputDatePig=03272013
> >
> >
> > And  finally it says exp/03272013 tough the directory exists as it gets
> > created in STORE command.
> >
> > What is wrong in this
> >
> >
> > This is the error I get
> >
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 32% complete
> > 2013-03-27 14:38:35,568 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 50% complete
> > 2013-03-27 14:38:45,731 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - job null has failed! Stop running all dependent jobs
> > 2013-03-27 14:38:45,731 [main] INFO
> >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2013-03-27 14:38:45,734 [main] ERROR
> > org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
> > recreate exception from backend error:
> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
> > path does not exist: hdfs://user/lnindrakrishna/exp/03272013/part-r-00000
> >
> >
> >
> > But when I remove second LOAD command, everything runs fine . Why does it
> > throw job "null has failed! Stop running all dependent jobs"
> >
>

Re: Commands not working properly when stored in pig file

Posted by Johnny Zhang <xi...@cloudera.com>.

Mix,
'null' is the failed job ID. From what I can tell, there is only one STORE
command and it actually fail, so MapReduceLauncher tries to stop
all dependent jobs, that's why the message is trhown. Can you double check
if the file exists in HDFS?

Johnny


On Wed, Mar 27, 2013 at 2:58 PM, Mix Nin <pi...@gmail.com> wrote:

> Sorry for posting same issue multiple times
>
> I  wrote a pig script as follows and stored it in x.pig file
>
> Data = LOAD '/....' as (,,,, )
> NoNullData= FILTER Data by qe is not null;
> STORE (foreach (group NoNullData all) generate flatten($1))  into
> 'exp/$inputDatePig';
>
>
> evnt_dtl =LOAD 'exp/$inputDatePig/part-r-00000' AS (cust,,,,,)
>
>
>
> I executed the command as follows
>
> pig  -f x.pig -param inputDatePig=03272013
>
>
> And  finally it says exp/03272013 tough the directory exists as it gets
> created in STORE command.
>
> What is wrong in this
>
>
> This is the error I get
>
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 32% complete
> 2013-03-27 14:38:35,568 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 50% complete
> 2013-03-27 14:38:45,731 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job null has failed! Stop running all dependent jobs
> 2013-03-27 14:38:45,731 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2013-03-27 14:38:45,734 [main] ERROR
> org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
> recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input
> path does not exist: hdfs://user/lnindrakrishna/exp/03272013/part-r-00000
>
>
>
> But when I remove second LOAD command, everything runs fine . Why does it
> throw job "null has failed! Stop running all dependent jobs"
>