You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Corbin Hoenes <co...@tynt.com> on 2010/07/27 23:09:08 UTC

mapred.min.split.size

Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.  My mappers are finishing ~10 secs.  I have ~20,000 of them.

Re: mapred.min.split.size

Posted by Corbin Hoenes <co...@tynt.com>.

I've upgraded to pig 0.8 and still not able to correctly set the input split
size.  It still defaults to DFS block size:

here are the params I set via the cmd line:
-Dmapred.min.split.size=512MB -Dpig.maxCombinedSplitSize=512MB
-Dpig.splitCombination=false

I'm starting to wonder if the ChukwaLoader isn't respecting the splits.
Anyone actually got this working?

On Thu, Aug 5, 2010 at 9:34 PM, Corbin Hoenes <co...@tynt.com> wrote:

> Thanks guys this is the issue.  Need to move to pig 0.7 and while I'm at it
> upgrade to the latest chukwa.
>
> On Aug 5, 2010, at 6:38 PM, Richard Ding wrote:
>
> > Pig 0.6 implements its own splits (called slice) with size equal to the
> block size. So this explains why the setting doesn't work.
> >
> > Thanks,
> > -Richard
> >
> > -----Original Message-----
> > From: Bill Graham [mailto:billgraham@gmail.com]
> > Sent: Thursday, August 05, 2010 5:06 PM
> > To: pig-user@hadoop.apache.org
> > Subject: Re: mapred.min.split.size
> >
> > FYI, Chukwa support for Pig 0.7.0 was just committed last week:
> >
> > https://issues.apache.org/jira/browse/CHUKWA-495
> >
> > The patch was built on Chukwa 0.4.0, but you could try applying the patch
> > against Chukwa 0.3.0. I don't think the relevant code changed much
> between
> > 3-4.
> >
> >
> > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <rd...@yahoo-inc.com>
> wrote:
> >
> >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
> >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
> >> property should work.
> >>
> >> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
> >>
> >> Thanks,
> >> -Richard
> >>
> >> -----Original Message-----
> >> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >> Sent: Thursday, August 05, 2010 3:50 PM
> >> To: pig-user@hadoop.apache.org
> >> Subject: Re: mapred.min.split.size
> >>
> >> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
> >> responsibility to deal with input splits?
> >>
> >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
> >>
> >>> I misunderstood your earlier question. If you have one large file, set
> >> mapred.min.split.size property will help to increase the file split
> size.
> >> Pig will pass system properties to Hadoop. What loader are you using?
> >>>
> >>> Thanks,
> >>> -Richard
> >>>
> >>> -----Original Message-----
> >>> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >>> Sent: Thursday, August 05, 2010 1:22 PM
> >>> To: pig-user@hadoop.apache.org
> >>> Subject: Re: mapred.min.split.size
> >>>
> >>> So what does pig do when I have a 5 gig file?  Does it simply hardcode
> >> the split size to block size?   Is there no way to tell it to just
> operate
> >> on a larger split size?
> >>>
> >>>
> >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> >>>
> >>>> For Pig loaders, each split can have at most one file, doesn't matter
> >> what split size is.
> >>>>
> >>>> You can concatenate the input files before loading them.
> >>>>
> >>>> Thanks,
> >>>> -Richard
> >>>> -----Original Message-----
> >>>> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >>>> Sent: Tuesday, July 27, 2010 2:09 PM
> >>>> To: pig-user@hadoop.apache.org
> >>>> Subject: mapred.min.split.size
> >>>>
> >>>> Is there a way to set the mapred.min.split.size property in pig? I set
> >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ
> counter.
> >> My mappers are finishing ~10 secs.  I have ~20,000 of them.
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: mapred.min.split.size

Posted by Corbin Hoenes <co...@tynt.com>.

Thanks guys this is the issue.  Need to move to pig 0.7 and while I'm at it upgrade to the latest chukwa.

On Aug 5, 2010, at 6:38 PM, Richard Ding wrote:

> Pig 0.6 implements its own splits (called slice) with size equal to the block size. So this explains why the setting doesn't work.
> 
> Thanks,
> -Richard
> 
> -----Original Message-----
> From: Bill Graham [mailto:billgraham@gmail.com] 
> Sent: Thursday, August 05, 2010 5:06 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: mapred.min.split.size
> 
> FYI, Chukwa support for Pig 0.7.0 was just committed last week:
> 
> https://issues.apache.org/jira/browse/CHUKWA-495
> 
> The patch was built on Chukwa 0.4.0, but you could try applying the patch
> against Chukwa 0.3.0. I don't think the relevant code changed much between
> 3-4.
> 
> 
> On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <rd...@yahoo-inc.com> wrote:
> 
>> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
>> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
>> property should work.
>> 
>> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
>> 
>> Thanks,
>> -Richard
>> 
>> -----Original Message-----
>> From: Corbin Hoenes [mailto:corbin@tynt.com]
>> Sent: Thursday, August 05, 2010 3:50 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: mapred.min.split.size
>> 
>> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
>> responsibility to deal with input splits?
>> 
>> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
>> 
>>> I misunderstood your earlier question. If you have one large file, set
>> mapred.min.split.size property will help to increase the file split size.
>> Pig will pass system properties to Hadoop. What loader are you using?
>>> 
>>> Thanks,
>>> -Richard
>>> 
>>> -----Original Message-----
>>> From: Corbin Hoenes [mailto:corbin@tynt.com]
>>> Sent: Thursday, August 05, 2010 1:22 PM
>>> To: pig-user@hadoop.apache.org
>>> Subject: Re: mapred.min.split.size
>>> 
>>> So what does pig do when I have a 5 gig file?  Does it simply hardcode
>> the split size to block size?   Is there no way to tell it to just operate
>> on a larger split size?
>>> 
>>> 
>>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
>>> 
>>>> For Pig loaders, each split can have at most one file, doesn't matter
>> what split size is.
>>>> 
>>>> You can concatenate the input files before loading them.
>>>> 
>>>> Thanks,
>>>> -Richard
>>>> -----Original Message-----
>>>> From: Corbin Hoenes [mailto:corbin@tynt.com]
>>>> Sent: Tuesday, July 27, 2010 2:09 PM
>>>> To: pig-user@hadoop.apache.org
>>>> Subject: mapred.min.split.size
>>>> 
>>>> Is there a way to set the mapred.min.split.size property in pig? I set
>> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.
>> My mappers are finishing ~10 secs.  I have ~20,000 of them.
>>>> 
>>>> 
>>>> 
>>> 
>> 
>>

RE: mapred.min.split.size

Posted by Richard Ding <rd...@yahoo-inc.com>.

Pig 0.6 implements its own splits (called slice) with size equal to the block size. So this explains why the setting doesn't work.

Thanks,
-Richard

-----Original Message-----
From: Bill Graham [mailto:billgraham@gmail.com] 
Sent: Thursday, August 05, 2010 5:06 PM
To: pig-user@hadoop.apache.org
Subject: Re: mapred.min.split.size

FYI, Chukwa support for Pig 0.7.0 was just committed last week:

https://issues.apache.org/jira/browse/CHUKWA-495

The patch was built on Chukwa 0.4.0, but you could try applying the patch
against Chukwa 0.3.0. I don't think the relevant code changed much between
3-4.


On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <rd...@yahoo-inc.com> wrote:

> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
> property should work.
>
> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
>
> Thanks,
> -Richard
>
> -----Original Message-----
> From: Corbin Hoenes [mailto:corbin@tynt.com]
> Sent: Thursday, August 05, 2010 3:50 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: mapred.min.split.size
>
> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
> responsibility to deal with input splits?
>
> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
>
> > I misunderstood your earlier question. If you have one large file, set
> mapred.min.split.size property will help to increase the file split size.
> Pig will pass system properties to Hadoop. What loader are you using?
> >
> > Thanks,
> > -Richard
> >
> > -----Original Message-----
> > From: Corbin Hoenes [mailto:corbin@tynt.com]
> > Sent: Thursday, August 05, 2010 1:22 PM
> > To: pig-user@hadoop.apache.org
> > Subject: Re: mapred.min.split.size
> >
> > So what does pig do when I have a 5 gig file?  Does it simply hardcode
> the split size to block size?   Is there no way to tell it to just operate
> on a larger split size?
> >
> >
> > On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> >
> >> For Pig loaders, each split can have at most one file, doesn't matter
> what split size is.
> >>
> >> You can concatenate the input files before loading them.
> >>
> >> Thanks,
> >> -Richard
> >> -----Original Message-----
> >> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >> Sent: Tuesday, July 27, 2010 2:09 PM
> >> To: pig-user@hadoop.apache.org
> >> Subject: mapred.min.split.size
> >>
> >> Is there a way to set the mapred.min.split.size property in pig? I set
> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.
>  My mappers are finishing ~10 secs.  I have ~20,000 of them.
> >>
> >>
> >>
> >
>
>

Re: mapred.min.split.size

Posted by Bill Graham <bi...@gmail.com>.

FYI, Chukwa support for Pig 0.7.0 was just committed last week:

https://issues.apache.org/jira/browse/CHUKWA-495

The patch was built on Chukwa 0.4.0, but you could try applying the patch
against Chukwa 0.3.0. I don't think the relevant code changed much between
3-4.


On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <rd...@yahoo-inc.com> wrote:

> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
> property should work.
>
> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
>
> Thanks,
> -Richard
>
> -----Original Message-----
> From: Corbin Hoenes [mailto:corbin@tynt.com]
> Sent: Thursday, August 05, 2010 3:50 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: mapred.min.split.size
>
> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
> responsibility to deal with input splits?
>
> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
>
> > I misunderstood your earlier question. If you have one large file, set
> mapred.min.split.size property will help to increase the file split size.
> Pig will pass system properties to Hadoop. What loader are you using?
> >
> > Thanks,
> > -Richard
> >
> > -----Original Message-----
> > From: Corbin Hoenes [mailto:corbin@tynt.com]
> > Sent: Thursday, August 05, 2010 1:22 PM
> > To: pig-user@hadoop.apache.org
> > Subject: Re: mapred.min.split.size
> >
> > So what does pig do when I have a 5 gig file?  Does it simply hardcode
> the split size to block size?   Is there no way to tell it to just operate
> on a larger split size?
> >
> >
> > On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> >
> >> For Pig loaders, each split can have at most one file, doesn't matter
> what split size is.
> >>
> >> You can concatenate the input files before loading them.
> >>
> >> Thanks,
> >> -Richard
> >> -----Original Message-----
> >> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >> Sent: Tuesday, July 27, 2010 2:09 PM
> >> To: pig-user@hadoop.apache.org
> >> Subject: mapred.min.split.size
> >>
> >> Is there a way to set the mapred.min.split.size property in pig? I set
> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.
>  My mappers are finishing ~10 secs.  I have ~20,000 of them.
> >>
> >>
> >>
> >
>
>

RE: mapred.min.split.size

Posted by Richard Ding <rd...@yahoo-inc.com>.

What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses Hadoop FileInputFormat to generate splits so the mapred.min.split.size property should work.

But from the release date, Chukwa 0.3 seems not on Pig 0.7. 

Thanks,
-Richard

-----Original Message-----
From: Corbin Hoenes [mailto:corbin@tynt.com] 
Sent: Thursday, August 05, 2010 3:50 PM
To: pig-user@hadoop.apache.org
Subject: Re: mapred.min.split.size

I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's responsibility to deal with input splits?

On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:

> I misunderstood your earlier question. If you have one large file, set mapred.min.split.size property will help to increase the file split size. Pig will pass system properties to Hadoop. What loader are you using?
> 
> Thanks,
> -Richard
> 
> -----Original Message-----
> From: Corbin Hoenes [mailto:corbin@tynt.com] 
> Sent: Thursday, August 05, 2010 1:22 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: mapred.min.split.size
> 
> So what does pig do when I have a 5 gig file?  Does it simply hardcode the split size to block size?   Is there no way to tell it to just operate on a larger split size?
> 
> 
> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> 
>> For Pig loaders, each split can have at most one file, doesn't matter what split size is.
>> 
>> You can concatenate the input files before loading them.
>> 
>> Thanks,
>> -Richard
>> -----Original Message-----
>> From: Corbin Hoenes [mailto:corbin@tynt.com] 
>> Sent: Tuesday, July 27, 2010 2:09 PM
>> To: pig-user@hadoop.apache.org
>> Subject: mapred.min.split.size
>> 
>> Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.  My mappers are finishing ~10 secs.  I have ~20,000 of them.
>> 
>> 
>> 
>

Re: mapred.min.split.size

Posted by Corbin Hoenes <co...@tynt.com>.

I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's responsibility to deal with input splits?

On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:

> I misunderstood your earlier question. If you have one large file, set mapred.min.split.size property will help to increase the file split size. Pig will pass system properties to Hadoop. What loader are you using?
> 
> Thanks,
> -Richard
> 
> -----Original Message-----
> From: Corbin Hoenes [mailto:corbin@tynt.com] 
> Sent: Thursday, August 05, 2010 1:22 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: mapred.min.split.size
> 
> So what does pig do when I have a 5 gig file?  Does it simply hardcode the split size to block size?   Is there no way to tell it to just operate on a larger split size?
> 
> 
> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> 
>> For Pig loaders, each split can have at most one file, doesn't matter what split size is.
>> 
>> You can concatenate the input files before loading them.
>> 
>> Thanks,
>> -Richard
>> -----Original Message-----
>> From: Corbin Hoenes [mailto:corbin@tynt.com] 
>> Sent: Tuesday, July 27, 2010 2:09 PM
>> To: pig-user@hadoop.apache.org
>> Subject: mapred.min.split.size
>> 
>> Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.  My mappers are finishing ~10 secs.  I have ~20,000 of them.
>> 
>> 
>> 
>

RE: mapred.min.split.size

Posted by Richard Ding <rd...@yahoo-inc.com>.

I misunderstood your earlier question. If you have one large file, set mapred.min.split.size property will help to increase the file split size. Pig will pass system properties to Hadoop. What loader are you using?

Thanks,
-Richard

-----Original Message-----
From: Corbin Hoenes [mailto:corbin@tynt.com] 
Sent: Thursday, August 05, 2010 1:22 PM
To: pig-user@hadoop.apache.org
Subject: Re: mapred.min.split.size

So what does pig do when I have a 5 gig file?  Does it simply hardcode the split size to block size?   Is there no way to tell it to just operate on a larger split size?

On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:

> For Pig loaders, each split can have at most one file, doesn't matter what split size is.
> 
> You can concatenate the input files before loading them.
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Corbin Hoenes [mailto:corbin@tynt.com] 
> Sent: Tuesday, July 27, 2010 2:09 PM
> To: pig-user@hadoop.apache.org
> Subject: mapred.min.split.size
> 
> Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.  My mappers are finishing ~10 secs.  I have ~20,000 of them.
> 
> 
>

Re: mapred.min.split.size

Posted by Corbin Hoenes <co...@tynt.com>.

So what does pig do when I have a 5 gig file?  Does it simply hardcode the split size to block size?   Is there no way to tell it to just operate on a larger split size?


On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:

> For Pig loaders, each split can have at most one file, doesn't matter what split size is.
> 
> You can concatenate the input files before loading them.
> 
> Thanks,
> -Richard
> -----Original Message-----
> From: Corbin Hoenes [mailto:corbin@tynt.com] 
> Sent: Tuesday, July 27, 2010 2:09 PM
> To: pig-user@hadoop.apache.org
> Subject: mapred.min.split.size
> 
> Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.  My mappers are finishing ~10 secs.  I have ~20,000 of them.
> 
> 
>

RE: mapred.min.split.size

Posted by Richard Ding <rd...@yahoo-inc.com>.

For Pig loaders, each split can have at most one file, doesn't matter what split size is.

You can concatenate the input files before loading them.

Thanks,
-Richard
-----Original Message-----
From: Corbin Hoenes [mailto:corbin@tynt.com] 
Sent: Tuesday, July 27, 2010 2:09 PM
To: pig-user@hadoop.apache.org
Subject: mapred.min.split.size

Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter.  My mappers are finishing ~10 secs.  I have ~20,000 of them.