You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by ey-chih chow <ey...@gmail.com> on 2013/12/04 00:03:19 UTC

Re: number of M/R jobs for a Pig Script

I got another question.  It I want to embed this pig script of multiple
group bys into a Java program, using PigServer.  Will the multiple stores
as follows be executed in one MR job?

pigServer.store("C", "output1");
pigServer.store("E", "output2");

If not, how can I achieve this?

Thanks.

Ey-Chih Chow


On Tue, Oct 15, 2013 at 3:40 PM, ey-chih chow <ey...@gmail.com> wrote:

> Thanks.  This is what I want.
>
> Best regards,
>
> Ey-Chih
>
>
> On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates <ga...@hortonworks.com> wrote:
>
>> Pig handles doing multiple group bys on the same input, often in a single
>> MR job.  So:
>>
>> A = load 'file';
>> B = group A by $0;
>> C = foreach B generate group, COUNT(A);
>> store C into 'output1';
>> D = group A by $1;
>> E = foreach D generate group, COUNT(A);
>> store D into 'output2';
>>
>> This can be done in a single MR job.  Is that what you're looking for?
>>
>> Alan.
>>
>> On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote:
>>
>> > What I really want to know is,in Pig, how can I read an input data set
>> only
>> > once and generate multiple instances with distinct keys for each data
>> point
>> > and do a group-by?
>> >
>> > Best regards,
>> >
>> > Ey-Chih Chow
>> >
>> >
>> > On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota <
>> pradeepg26@gmail.com>wrote:
>> >
>> >> I'm not aware of anyway to do that. I think you're also missing the
>> spirit
>> >> of Pig. Pig is meant to be a data workflow language. Describe a
>> workflow
>> >> for your data using PigLatin and Pig will then compile your script to
>> >> MapReduce jobs. The number of MapReduce jobs that it generates is the
>> >> smallest number of jobs (based on the optimizers) that Pig thinks it
>> needs
>> >> to complete the workflow.
>> >>
>> >> Why do you want to control the number of MR jobs?
>> >>
>> >>
>> >> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <ey...@gmail.com>
>> wrote:
>> >>
>> >>> Thanks everybody.  Is there anyway we can programmatically control the
>> >>> number of M-R jobs that a Pig script will generate, similar to write
>> M-R
>> >>> jobs in Java?
>> >>>
>> >>> Best regards,
>> >>>
>> >>> Ey-Chih Chow
>> >>>
>> >>>
>> >>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yunus@gmail.com
>> >>>> wrote:
>> >>>
>> >>>> And Geert's comment about using external-to-Pig approach reminds me
>> >> that,
>> >>>> then you have Netflix's PigLipstick too. Nice visual tool for actual
>> >>>> execution and stores job history as well.
>> >>>>
>> >>>> Regards,
>> >>>> Shahab
>> >>>>
>> >>>>
>> >>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem <
>> >> gvl@foundation.be
>> >>>>> wrote:
>> >>>>
>> >>>>> You can also use ambrose to monitor execution of your pig script at
>> >>>>> runtime. Remark: from pig-0.11 on.
>> >>>>>
>> >>>>> It show you the DAG of MR jobs and which are currently being
>> >> executed.
>> >>> As
>> >>>>> long as pig-ambrose is connected to the execution of your script
>> >>>> (workflow)
>> >>>>> you can replay the workflow.
>> >>>>>
>> >>>>> --
>> >>>>> kind regards,
>> >>>>> Geert
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <sh...@gmail.com>
>> >>> wrote:
>> >>>>>
>> >>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I
>> >>> know,
>> >>>> I
>> >>>>>> don't think they give you the exact number as it depends on the
>> >>> actual
>> >>>>> data
>> >>>>>> but I believe you can interpret it/extrapolate it from the
>> >>> information
>> >>>>>> provided by these commands.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Shahab
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <ey...@gmail.com>
>> >>>> wrote:
>> >>>>>>
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I have a Pig script that has two group-by statements on the the
>> >>> input
>> >>>>> data
>> >>>>>>> set.  Is there anybody knows how many M-R jobs the script will
>> >>>> generate?
>> >>>>>>> Thanks.
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>>
>> >>>>>>> Ey-Chih Chow
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>