You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by ey-chih chow <ey...@gmail.com> on 2013/10/15 09:57:54 UTC

number of M/R jobs for a Pig Script

Hi,

I have a Pig script that has two group-by statements on the the input data
set.  Is there anybody knows how many M-R jobs the script will generate?
 Thanks.

Best regards,

Ey-Chih Chow

Re: number of M/R jobs for a Pig Script

Posted by ey-chih chow <ey...@gmail.com>.

I got another question.  It I want to embed this pig script of multiple
group bys into a Java program, using PigServer.  Will the multiple stores
as follows be executed in one MR job?

pigServer.store("C", "output1");
pigServer.store("E", "output2");

If not, how can I achieve this?

Thanks.

Ey-Chih Chow


On Tue, Oct 15, 2013 at 3:40 PM, ey-chih chow <ey...@gmail.com> wrote:

> Thanks.  This is what I want.
>
> Best regards,
>
> Ey-Chih
>
>
> On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates <ga...@hortonworks.com> wrote:
>
>> Pig handles doing multiple group bys on the same input, often in a single
>> MR job.  So:
>>
>> A = load 'file';
>> B = group A by $0;
>> C = foreach B generate group, COUNT(A);
>> store C into 'output1';
>> D = group A by $1;
>> E = foreach D generate group, COUNT(A);
>> store D into 'output2';
>>
>> This can be done in a single MR job.  Is that what you're looking for?
>>
>> Alan.
>>
>> On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote:
>>
>> > What I really want to know is,in Pig, how can I read an input data set
>> only
>> > once and generate multiple instances with distinct keys for each data
>> point
>> > and do a group-by?
>> >
>> > Best regards,
>> >
>> > Ey-Chih Chow
>> >
>> >
>> > On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <
>> pradeepg26@gmail.com>wrote:
>> >
>> >> I'm not aware of anyway to do that. I think you're also missing the
>> spirit
>> >> of Pig. Pig is meant to be a data workflow language. Describe a
>> workflow
>> >> for your data using PigLatin and Pig will then compile your script to
>> >> MapReduce jobs. The number of MapReduce jobs that it generates is the
>> >> smallest number of jobs (based on the optimizers) that Pig thinks it
>> needs
>> >> to complete the workflow.
>> >>
>> >> Why do you want to control the number of MR jobs?
>> >>
>> >>
>> >> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com>
>> wrote:
>> >>
>> >>> Thanks everybody.  Is there anyway we can programmatically control the
>> >>> number of M-R jobs that a Pig script will generate, similar to write
>> M-R
>> >>> jobs in Java?
>> >>>
>> >>> Best regards,
>> >>>
>> >>> Ey-Chih Chow
>> >>>
>> >>>
>> >>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
>> >>>> wrote:
>> >>>
>> >>>> And Geert's comment about using external-to-Pig approach reminds me
>> >> that,
>> >>>> then you have Netflix's PigLipstick too. Nice visual tool for actual
>> >>>> execution and stores job history as well.
>> >>>>
>> >>>> Regards,
>> >>>> Shahab
>> >>>>
>> >>>>
>> >>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
>> >> gvl@foundation.be
>> >>>>> wrote:
>> >>>>
>> >>>>> You can also use ambrose to monitor execution of your pig script at
>> >>>>> runtime. Remark: from pig-0.11 on.
>> >>>>>
>> >>>>> It show you the DAG of MR jobs and which are currently being
>> >> executed.
>> >>> As
>> >>>>> long as pig-ambrose is connected to the execution of your script
>> >>>> (workflow)
>> >>>>> you can replay the workflow.
>> >>>>>
>> >>>>> --
>> >>>>> kind regards,
>> >>>>> Geert
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
>> >>> wrote:
>> >>>>>
>> >>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
>> >>> know,
>> >>>> I
>> >>>>>> don't think they give you the exact number as it depends on the
>> >>> actual
>> >>>>> data
>> >>>>>> but I believe you can interpret it/extrapolate it from the
>> >>> information
>> >>>>>> provided by these commands.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Shahab
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
>> >>>> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I have a Pig script that has two group-by statements on the the
>> >>> input
>> >>>>> data
>> >>>>>>> set.  Is there anybody knows how many M-R jobs the script will
>> >>>> generate?
>> >>>>>>> Thanks.
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>>
>> >>>>>>> Ey-Chih Chow
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: number of M/R jobs for a Pig Script

Posted by ey-chih chow <ey...@gmail.com>.

Thanks.  This is what I want.

Best regards,

Ey-Chih


On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates <ga...@hortonworks.com> wrote:

> Pig handles doing multiple group bys on the same input, often in a single
> MR job.  So:
>
> A = load 'file';
> B = group A by $0;
> C = foreach B generate group, COUNT(A);
> store C into 'output1';
> D = group A by $1;
> E = foreach D generate group, COUNT(A);
> store D into 'output2';
>
> This can be done in a single MR job.  Is that what you're looking for?
>
> Alan.
>
> On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote:
>
> > What I really want to know is,in Pig, how can I read an input data set
> only
> > once and generate multiple instances with distinct keys for each data
> point
> > and do a group-by?
> >
> > Best regards,
> >
> > Ey-Chih Chow
> >
> >
> > On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <
> pradeepg26@gmail.com>wrote:
> >
> >> I'm not aware of anyway to do that. I think you're also missing the
> spirit
> >> of Pig. Pig is meant to be a data workflow language. Describe a workflow
> >> for your data using PigLatin and Pig will then compile your script to
> >> MapReduce jobs. The number of MapReduce jobs that it generates is the
> >> smallest number of jobs (based on the optimizers) that Pig thinks it
> needs
> >> to complete the workflow.
> >>
> >> Why do you want to control the number of MR jobs?
> >>
> >>
> >> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com>
> wrote:
> >>
> >>> Thanks everybody.  Is there anyway we can programmatically control the
> >>> number of M-R jobs that a Pig script will generate, similar to write
> M-R
> >>> jobs in Java?
> >>>
> >>> Best regards,
> >>>
> >>> Ey-Chih Chow
> >>>
> >>>
> >>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
> >>>> wrote:
> >>>
> >>>> And Geert's comment about using external-to-Pig approach reminds me
> >> that,
> >>>> then you have Netflix's PigLipstick too. Nice visual tool for actual
> >>>> execution and stores job history as well.
> >>>>
> >>>> Regards,
> >>>> Shahab
> >>>>
> >>>>
> >>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
> >> gvl@foundation.be
> >>>>> wrote:
> >>>>
> >>>>> You can also use ambrose to monitor execution of your pig script at
> >>>>> runtime. Remark: from pig-0.11 on.
> >>>>>
> >>>>> It show you the DAG of MR jobs and which are currently being
> >> executed.
> >>> As
> >>>>> long as pig-ambrose is connected to the execution of your script
> >>>> (workflow)
> >>>>> you can replay the workflow.
> >>>>>
> >>>>> --
> >>>>> kind regards,
> >>>>> Geert
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
> >>> know,
> >>>> I
> >>>>>> don't think they give you the exact number as it depends on the
> >>> actual
> >>>>> data
> >>>>>> but I believe you can interpret it/extrapolate it from the
> >>> information
> >>>>>> provided by these commands.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Shahab
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have a Pig script that has two group-by statements on the the
> >>> input
> >>>>> data
> >>>>>>> set.  Is there anybody knows how many M-R jobs the script will
> >>>> generate?
> >>>>>>> Thanks.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>>
> >>>>>>> Ey-Chih Chow
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: number of M/R jobs for a Pig Script

Posted by Alan Gates <ga...@hortonworks.com>.

Pig handles doing multiple group bys on the same input, often in a single MR job.  So:

A = load 'file';
B = group A by $0;
C = foreach B generate group, COUNT(A);
store C into 'output1';
D = group A by $1;
E = foreach D generate group, COUNT(A);
store D into 'output2';

This can be done in a single MR job.  Is that what you're looking for?

Alan.

On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote:

> What I really want to know is,in Pig, how can I read an input data set only
> once and generate multiple instances with distinct keys for each data point
> and do a group-by?
> 
> Best regards,
> 
> Ey-Chih Chow
> 
> 
> On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <pr...@gmail.com>wrote:
> 
>> I'm not aware of anyway to do that. I think you're also missing the spirit
>> of Pig. Pig is meant to be a data workflow language. Describe a workflow
>> for your data using PigLatin and Pig will then compile your script to
>> MapReduce jobs. The number of MapReduce jobs that it generates is the
>> smallest number of jobs (based on the optimizers) that Pig thinks it needs
>> to complete the workflow.
>> 
>> Why do you want to control the number of MR jobs?
>> 
>> 
>> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com> wrote:
>> 
>>> Thanks everybody.  Is there anyway we can programmatically control the
>>> number of M-R jobs that a Pig script will generate, similar to write M-R
>>> jobs in Java?
>>> 
>>> Best regards,
>>> 
>>> Ey-Chih Chow
>>> 
>>> 
>>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
>>>> wrote:
>>> 
>>>> And Geert's comment about using external-to-Pig approach reminds me
>> that,
>>>> then you have Netflix's PigLipstick too. Nice visual tool for actual
>>>> execution and stores job history as well.
>>>> 
>>>> Regards,
>>>> Shahab
>>>> 
>>>> 
>>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
>> gvl@foundation.be
>>>>> wrote:
>>>> 
>>>>> You can also use ambrose to monitor execution of your pig script at
>>>>> runtime. Remark: from pig-0.11 on.
>>>>> 
>>>>> It show you the DAG of MR jobs and which are currently being
>> executed.
>>> As
>>>>> long as pig-ambrose is connected to the execution of your script
>>>> (workflow)
>>>>> you can replay the workflow.
>>>>> 
>>>>> --
>>>>> kind regards,
>>>>> Geert
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
>>> know,
>>>> I
>>>>>> don't think they give you the exact number as it depends on the
>>> actual
>>>>> data
>>>>>> but I believe you can interpret it/extrapolate it from the
>>> information
>>>>>> provided by these commands.
>>>>>> 
>>>>>> Regards,
>>>>>> Shahab
>>>>>> 
>>>>>> 
>>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I have a Pig script that has two group-by statements on the the
>>> input
>>>>> data
>>>>>>> set.  Is there anybody knows how many M-R jobs the script will
>>>> generate?
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Ey-Chih Chow
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: number of M/R jobs for a Pig Script

Posted by Pradeep Gollakota <pr...@gmail.com>.

Can you describe what your input data looks like and what you want your
output data to look like?

I don’t understand your question. A group by is really straight forward to
do on a dataset.

A = LOAD 'mydata' using MyStorage();
B = GROUP A BY group_key;
dump B;

Is that what you’re looking for?


On Tue, Oct 15, 2013 at 12:12 PM, ey-chih chow <ey...@gmail.com> wrote:

> What I really want to know is,in Pig, how can I read an input data set only
> once and generate multiple instances with distinct keys for each data point
> and do a group-by?
>
> Best regards,
>
> Ey-Chih Chow
>
>
> On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <pradeepg26@gmail.com
> >wrote:
>
> > I'm not aware of anyway to do that. I think you're also missing the
> spirit
> > of Pig. Pig is meant to be a data workflow language. Describe a workflow
> > for your data using PigLatin and Pig will then compile your script to
> > MapReduce jobs. The number of MapReduce jobs that it generates is the
> > smallest number of jobs (based on the optimizers) that Pig thinks it
> needs
> > to complete the workflow.
> >
> > Why do you want to control the number of MR jobs?
> >
> >
> > On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com> wrote:
> >
> > > Thanks everybody.  Is there anyway we can programmatically control the
> > > number of M-R jobs that a Pig script will generate, similar to write
> M-R
> > > jobs in Java?
> > >
> > > Best regards,
> > >
> > > Ey-Chih Chow
> > >
> > >
> > > On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
> > > >wrote:
> > >
> > > > And Geert's comment about using external-to-Pig approach reminds me
> > that,
> > > > then you have Netflix's PigLipstick too. Nice visual tool for actual
> > > > execution and stores job history as well.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
> > gvl@foundation.be
> > > > >wrote:
> > > >
> > > > > You can also use ambrose to monitor execution of your pig script at
> > > > > runtime. Remark: from pig-0.11 on.
> > > > >
> > > > > It show you the DAG of MR jobs and which are currently being
> > executed.
> > > As
> > > > > long as pig-ambrose is connected to the execution of your script
> > > > (workflow)
> > > > > you can replay the workflow.
> > > > >
> > > > > --
> > > > > kind regards,
> > > > >  Geert
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
> > > know,
> > > > I
> > > > > > don't think they give you the exact number as it depends on the
> > > actual
> > > > > data
> > > > > > but I believe you can interpret it/extrapolate it from the
> > > information
> > > > > > provided by these commands.
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > >
> > > > > > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >>
> > > > > >> I have a Pig script that has two group-by statements on the the
> > > input
> > > > > data
> > > > > >> set.  Is there anybody knows how many M-R jobs the script will
> > > > generate?
> > > > > >> Thanks.
> > > > > >>
> > > > > >> Best regards,
> > > > > >>
> > > > > >> Ey-Chih Chow
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: number of M/R jobs for a Pig Script

Posted by ey-chih chow <ey...@gmail.com>.

What I really want to know is,in Pig, how can I read an input data set only
once and generate multiple instances with distinct keys for each data point
and do a group-by?

Best regards,

Ey-Chih Chow


On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <pr...@gmail.com>wrote:

> I'm not aware of anyway to do that. I think you're also missing the spirit
> of Pig. Pig is meant to be a data workflow language. Describe a workflow
> for your data using PigLatin and Pig will then compile your script to
> MapReduce jobs. The number of MapReduce jobs that it generates is the
> smallest number of jobs (based on the optimizers) that Pig thinks it needs
> to complete the workflow.
>
> Why do you want to control the number of MR jobs?
>
>
> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com> wrote:
>
> > Thanks everybody.  Is there anyway we can programmatically control the
> > number of M-R jobs that a Pig script will generate, similar to write M-R
> > jobs in Java?
> >
> > Best regards,
> >
> > Ey-Chih Chow
> >
> >
> > On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
> > >wrote:
> >
> > > And Geert's comment about using external-to-Pig approach reminds me
> that,
> > > then you have Netflix's PigLipstick too. Nice visual tool for actual
> > > execution and stores job history as well.
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
> gvl@foundation.be
> > > >wrote:
> > >
> > > > You can also use ambrose to monitor execution of your pig script at
> > > > runtime. Remark: from pig-0.11 on.
> > > >
> > > > It show you the DAG of MR jobs and which are currently being
> executed.
> > As
> > > > long as pig-ambrose is connected to the execution of your script
> > > (workflow)
> > > > you can replay the workflow.
> > > >
> > > > --
> > > > kind regards,
> > > >  Geert
> > > >
> > > >
> > > >
> > > >
> > > > On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
> > wrote:
> > > >
> > > > > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
> > know,
> > > I
> > > > > don't think they give you the exact number as it depends on the
> > actual
> > > > data
> > > > > but I believe you can interpret it/extrapolate it from the
> > information
> > > > > provided by these commands.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > >
> > > > > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I have a Pig script that has two group-by statements on the the
> > input
> > > > data
> > > > >> set.  Is there anybody knows how many M-R jobs the script will
> > > generate?
> > > > >> Thanks.
> > > > >>
> > > > >> Best regards,
> > > > >>
> > > > >> Ey-Chih Chow
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: number of M/R jobs for a Pig Script

Posted by Pradeep Gollakota <pr...@gmail.com>.

I'm not aware of anyway to do that. I think you're also missing the spirit
of Pig. Pig is meant to be a data workflow language. Describe a workflow
for your data using PigLatin and Pig will then compile your script to
MapReduce jobs. The number of MapReduce jobs that it generates is the
smallest number of jobs (based on the optimizers) that Pig thinks it needs
to complete the workflow.

Why do you want to control the number of MR jobs?


On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com> wrote:

> Thanks everybody.  Is there anyway we can programmatically control the
> number of M-R jobs that a Pig script will generate, similar to write M-R
> jobs in Java?
>
> Best regards,
>
> Ey-Chih Chow
>
>
> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
> >wrote:
>
> > And Geert's comment about using external-to-Pig approach reminds me that,
> > then you have Netflix's PigLipstick too. Nice visual tool for actual
> > execution and stores job history as well.
> >
> > Regards,
> > Shahab
> >
> >
> > On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <gvl@foundation.be
> > >wrote:
> >
> > > You can also use ambrose to monitor execution of your pig script at
> > > runtime. Remark: from pig-0.11 on.
> > >
> > > It show you the DAG of MR jobs and which are currently being executed.
> As
> > > long as pig-ambrose is connected to the execution of your script
> > (workflow)
> > > you can replay the workflow.
> > >
> > > --
> > > kind regards,
> > >  Geert
> > >
> > >
> > >
> > >
> > > On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
> wrote:
> > >
> > > > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
> know,
> > I
> > > > don't think they give you the exact number as it depends on the
> actual
> > > data
> > > > but I believe you can interpret it/extrapolate it from the
> information
> > > > provided by these commands.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
> > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> I have a Pig script that has two group-by statements on the the
> input
> > > data
> > > >> set.  Is there anybody knows how many M-R jobs the script will
> > generate?
> > > >> Thanks.
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Ey-Chih Chow
> > > >>
> > >
> > >
> >
>

Re: number of M/R jobs for a Pig Script

Posted by ey-chih chow <ey...@gmail.com>.

Thanks everybody.  Is there anyway we can programmatically control the
number of M-R jobs that a Pig script will generate, similar to write M-R
jobs in Java?

Best regards,

Ey-Chih Chow


On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <sh...@gmail.com>wrote:

> And Geert's comment about using external-to-Pig approach reminds me that,
> then you have Netflix's PigLipstick too. Nice visual tool for actual
> execution and stores job history as well.
>
> Regards,
> Shahab
>
>
> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <gvl@foundation.be
> >wrote:
>
> > You can also use ambrose to monitor execution of your pig script at
> > runtime. Remark: from pig-0.11 on.
> >
> > It show you the DAG of MR jobs and which are currently being executed. As
> > long as pig-ambrose is connected to the execution of your script
> (workflow)
> > you can replay the workflow.
> >
> > --
> > kind regards,
> >  Geert
> >
> >
> >
> >
> > On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com> wrote:
> >
> > > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know,
> I
> > > don't think they give you the exact number as it depends on the actual
> > data
> > > but I believe you can interpret it/extrapolate it from the information
> > > provided by these commands.
> > >
> > > Regards,
> > > Shahab
> > >
> > >
> > > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have a Pig script that has two group-by statements on the the input
> > data
> > >> set.  Is there anybody knows how many M-R jobs the script will
> generate?
> > >> Thanks.
> > >>
> > >> Best regards,
> > >>
> > >> Ey-Chih Chow
> > >>
> >
> >
>

Re: number of M/R jobs for a Pig Script

Posted by Shahab Yunus <sh...@gmail.com>.

And Geert's comment about using external-to-Pig approach reminds me that,
then you have Netflix's PigLipstick too. Nice visual tool for actual
execution and stores job history as well.

Regards,
Shahab


On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <gv...@foundation.be>wrote:

> You can also use ambrose to monitor execution of your pig script at
> runtime. Remark: from pig-0.11 on.
>
> It show you the DAG of MR jobs and which are currently being executed. As
> long as pig-ambrose is connected to the execution of your script (workflow)
> you can replay the workflow.
>
> --
> kind regards,
>  Geert
>
>
>
>
> On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com> wrote:
>
> > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know, I
> > don't think they give you the exact number as it depends on the actual
> data
> > but I believe you can interpret it/extrapolate it from the information
> > provided by these commands.
> >
> > Regards,
> > Shahab
> >
> >
> > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a Pig script that has two group-by statements on the the input
> data
> >> set.  Is there anybody knows how many M-R jobs the script will generate?
> >> Thanks.
> >>
> >> Best regards,
> >>
> >> Ey-Chih Chow
> >>
>
>

Re: number of M/R jobs for a Pig Script

Posted by Bertrand Dechoux <de...@gmail.com>.

Or Lipstick : https://github.com/Netflix/Lipstick
It's Netflix this time instead of Twitter. ;)

http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html

But by simply running the script, the information your are looking for will
be displayed at the end of the job.

Bertrand


On Tue, Oct 15, 2013 at 2:51 PM, Geert Van Landeghem <gv...@foundation.be>wrote:

> You can also use ambrose to monitor execution of your pig script at
> runtime. Remark: from pig-0.11 on.
>
> It show you the DAG of MR jobs and which are currently being executed. As
> long as pig-ambrose is connected to the execution of your script (workflow)
> you can replay the workflow.
>
> --
> kind regards,
>  Geert
>
>
>
>
> On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com> wrote:
>
> > Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know, I
> > don't think they give you the exact number as it depends on the actual
> data
> > but I believe you can interpret it/extrapolate it from the information
> > provided by these commands.
> >
> > Regards,
> > Shahab
> >
> >
> > On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a Pig script that has two group-by statements on the the input
> data
> >> set.  Is there anybody knows how many M-R jobs the script will generate?
> >> Thanks.
> >>
> >> Best regards,
> >>
> >> Ey-Chih Chow
> >>
>
>

Re: number of M/R jobs for a Pig Script

Posted by Geert Van Landeghem <gv...@foundation.be>.

You can also use ambrose to monitor execution of your pig script at runtime. Remark: from pig-0.11 on.

It show you the DAG of MR jobs and which are currently being executed. As long as pig-ambrose is connected to the execution of your script (workflow) you can replay the workflow.

-- 
kind regards,
 Geert

On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com> wrote:

> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know, I
> don't think they give you the exact number as it depends on the actual data
> but I believe you can interpret it/extrapolate it from the information
> provided by these commands.
> 
> Regards,
> Shahab
> 
> 
> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I have a Pig script that has two group-by statements on the the input data
>> set.  Is there anybody knows how many M-R jobs the script will generate?
>> Thanks.
>> 
>> Best regards,
>> 
>> Ey-Chih Chow
>>

Re: number of M/R jobs for a Pig Script

Posted by Shahab Yunus <sh...@gmail.com>.

Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know, I
don't think they give you the exact number as it depends on the actual data
but I believe you can interpret it/extrapolate it from the information
provided by these commands.

Regards,
Shahab

On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com> wrote:

> Hi,
>
> I have a Pig script that has two group-by statements on the the input data
> set.  Is there anybody knows how many M-R jobs the script will generate?
>  Thanks.
>
> Best regards,
>
> Ey-Chih Chow
>