You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Yang <te...@gmail.com> on 2012/06/11 04:06:32 UTC

different mapred.min.split.size within one pig script?

I need to set mapred.min.split.size for one part of my pig script
because the mapper job corresponding to the first part of the script takes
much longer time per input record than other parts of the script.

so I have to set the split size very small to take care of that particular
script,

but then later parts of the script also used this value and used too many
splits,

is it possible to set min.split.size value to different values within the
same script?

Thanks
Yang

Re: different mapred.min.split.size within one pig script?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Correct; I don't think there is a good way to do that except perhaps
by inserting "exec" statements to separate parts of the script that
you need to execute with the different settings.

D

On Wed, Jun 13, 2012 at 11:08 PM, Yang <te...@gmail.com> wrote:
> thanks,
>
> I tried, but it does not seem to work,  even after I put the second set
> split.size= at the very end of the script,
> it is the second SET that takes effect for both places i used the SET.
>
> Yang
>
> On Tue, Jun 12, 2012 at 3:56 PM, Alex Rovner <al...@gmail.com> wrote:
>
>> Yes. Use the "set" keyword right before the operation that needs this
>> setting. Since pig will optimize certain statements and collapse them into
>> a single job, you would have to move your statement up a couple
>> instructions in order for it to take effect.
>>
>> Sent from my iPhone
>>
>> On Jun 10, 2012, at 10:06 PM, Yang <te...@gmail.com> wrote:
>>
>> > I need to set mapred.min.split.size for one part of my pig script
>> > because the mapper job corresponding to the first part of the script
>> takes
>> > much longer time per input record than other parts of the script.
>> >
>> > so I have to set the split size very small to take care of that
>> particular
>> > script,
>> >
>> > but then later parts of the script also used this value and used too many
>> > splits,
>> >
>> > is it possible to set min.split.size value to different values within the
>> > same script?
>> >
>> > Thanks
>> > Yang
>>

Re: different mapred.min.split.size within one pig script?

Posted by Yang <te...@gmail.com>.
thanks,

I tried, but it does not seem to work,  even after I put the second set
split.size= at the very end of the script,
it is the second SET that takes effect for both places i used the SET.

Yang

On Tue, Jun 12, 2012 at 3:56 PM, Alex Rovner <al...@gmail.com> wrote:

> Yes. Use the "set" keyword right before the operation that needs this
> setting. Since pig will optimize certain statements and collapse them into
> a single job, you would have to move your statement up a couple
> instructions in order for it to take effect.
>
> Sent from my iPhone
>
> On Jun 10, 2012, at 10:06 PM, Yang <te...@gmail.com> wrote:
>
> > I need to set mapred.min.split.size for one part of my pig script
> > because the mapper job corresponding to the first part of the script
> takes
> > much longer time per input record than other parts of the script.
> >
> > so I have to set the split size very small to take care of that
> particular
> > script,
> >
> > but then later parts of the script also used this value and used too many
> > splits,
> >
> > is it possible to set min.split.size value to different values within the
> > same script?
> >
> > Thanks
> > Yang
>

Re: different mapred.min.split.size within one pig script?

Posted by Alex Rovner <al...@gmail.com>.
Yes. Use the "set" keyword right before the operation that needs this setting. Since pig will optimize certain statements and collapse them into a single job, you would have to move your statement up a couple instructions in order for it to take effect.  

Sent from my iPhone

On Jun 10, 2012, at 10:06 PM, Yang <te...@gmail.com> wrote:

> I need to set mapred.min.split.size for one part of my pig script
> because the mapper job corresponding to the first part of the script takes
> much longer time per input record than other parts of the script.
> 
> so I have to set the split size very small to take care of that particular
> script,
> 
> but then later parts of the script also used this value and used too many
> splits,
> 
> is it possible to set min.split.size value to different values within the
> same script?
> 
> Thanks
> Yang