You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Rodrigo Ferreira <we...@gmail.com> on 2014/07/20 14:40:27 UTC

Is it possible to fix MR jobs order in Pig?

I have a Pig script that was divided by the Pig framework in two MapReduce
jobs. So far so good.

One of these jobs was always failing. When I checked the logs I realized
that Pig is executing the "2nd" job before the "1st".

Well, I think this is happening because the second part of my script
doesn't depend explicitly on the first part. But I'd like it to be executed
before the other part. Is it possible?

I know Pig tries to optimize several things, but changing the order of the
MR jobs is not something nice. Are pigs "domestic animals" are all?

By the way, how much control do we really have over Pig's internal DAG?

Thanks,
Rodrigo Ferreira.

Re: Is it possible to fix MR jobs order in Pig?

Posted by Jacob Perkins <ja...@gmail.com>.
Rodrigo,

I see you're using pig 0.9? The latest code (pig 0.13) is better about preserving order when building the execution plan. See PIG-3902 (https://issues.apache.org/jira/browse/PIG-3902) You might try it without an exec on pig 0.13 if you can. Are you able to share (at least a skeleton of) your script?

--jacob
@thedatachef

On Jul 20, 2014, at 6:43 AM, Rodrigo Ferreira <we...@gmail.com> wrote:

> Hi everyone,
> 
> I found the answer here:
> http://pig.apache.org/docs/r0.9.1/perf.html#Implicit-Dependencies
> 
> It seems that when you have implicit dependencies you have to use the EXEC
> command in order to help Pig execute your jobs in the right order.
> 
> Rodrigo.
> 
> 
> 2014-07-20 14:40 GMT+02:00 Rodrigo Ferreira <we...@gmail.com>:
> 
>> I have a Pig script that was divided by the Pig framework in two MapReduce
>> jobs. So far so good.
>> 
>> One of these jobs was always failing. When I checked the logs I realized
>> that Pig is executing the "2nd" job before the "1st".
>> 
>> Well, I think this is happening because the second part of my script
>> doesn't depend explicitly on the first part. But I'd like it to be executed
>> before the other part. Is it possible?
>> 
>> I know Pig tries to optimize several things, but changing the order of the
>> MR jobs is not something nice. Are pigs "domestic animals" are all?
>> 
>> By the way, how much control do we really have over Pig's internal DAG?
>> 
>> Thanks,
>> Rodrigo Ferreira.
>> 


Re: Is it possible to fix MR jobs order in Pig?

Posted by Bertrand Dechoux <de...@gmail.com>.
Well, a user don't really know how many jobs will be scheduled and so their
order is not something that should matter. A pig script should really be
seen as a graph of operators. Your problem was that a dependency between
two operators was implicit. Exec allows to 'flush' the existing graph and
make sure it has been realised before executing the rest of the operators
below. I would rather try to either make that dependency explicit or if not
possible, split the script into two separates files to be more explicit.
The exec is also a fix but it will impact how much pig can optimize the
global workflow.

Bertrand


On Sun, Jul 20, 2014 at 3:43 PM, Rodrigo Ferreira <we...@gmail.com> wrote:

> Hi everyone,
>
> I found the answer here:
> http://pig.apache.org/docs/r0.9.1/perf.html#Implicit-Dependencies
>
> It seems that when you have implicit dependencies you have to use the EXEC
> command in order to help Pig execute your jobs in the right order.
>
> Rodrigo.
>
>
> 2014-07-20 14:40 GMT+02:00 Rodrigo Ferreira <we...@gmail.com>:
>
> > I have a Pig script that was divided by the Pig framework in two
> MapReduce
> > jobs. So far so good.
> >
> > One of these jobs was always failing. When I checked the logs I realized
> > that Pig is executing the "2nd" job before the "1st".
> >
> > Well, I think this is happening because the second part of my script
> > doesn't depend explicitly on the first part. But I'd like it to be
> executed
> > before the other part. Is it possible?
> >
> > I know Pig tries to optimize several things, but changing the order of
> the
> > MR jobs is not something nice. Are pigs "domestic animals" are all?
> >
> > By the way, how much control do we really have over Pig's internal DAG?
> >
> > Thanks,
> > Rodrigo Ferreira.
> >
>

Re: Is it possible to fix MR jobs order in Pig?

Posted by Rodrigo Ferreira <we...@gmail.com>.
Hi everyone,

I found the answer here:
http://pig.apache.org/docs/r0.9.1/perf.html#Implicit-Dependencies

It seems that when you have implicit dependencies you have to use the EXEC
command in order to help Pig execute your jobs in the right order.

Rodrigo.


2014-07-20 14:40 GMT+02:00 Rodrigo Ferreira <we...@gmail.com>:

> I have a Pig script that was divided by the Pig framework in two MapReduce
> jobs. So far so good.
>
> One of these jobs was always failing. When I checked the logs I realized
> that Pig is executing the "2nd" job before the "1st".
>
> Well, I think this is happening because the second part of my script
> doesn't depend explicitly on the first part. But I'd like it to be executed
> before the other part. Is it possible?
>
> I know Pig tries to optimize several things, but changing the order of the
> MR jobs is not something nice. Are pigs "domestic animals" are all?
>
> By the way, how much control do we really have over Pig's internal DAG?
>
> Thanks,
> Rodrigo Ferreira.
>