You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Corbin Hoenes <co...@tynt.com> on 2011/04/18 22:40:36 UTC
Re: mapred.min.split.size

I've upgraded to pig 0.8 and still not able to correctly set the input split
size.  It still defaults to DFS block size:

here are the params I set via the cmd line:
-Dmapred.min.split.size=512MB -Dpig.maxCombinedSplitSize=512MB
-Dpig.splitCombination=false

I'm starting to wonder if the ChukwaLoader isn't respecting the splits.
Anyone actually got this working?

On Thu, Aug 5, 2010 at 9:34 PM, Corbin Hoenes <co...@tynt.com> wrote:

> Thanks guys this is the issue.  Need to move to pig 0.7 and while I'm at it
> upgrade to the latest chukwa.
>
> On Aug 5, 2010, at 6:38 PM, Richard Ding wrote:
>
> > Pig 0.6 implements its own splits (called slice) with size equal to the
> block size. So this explains why the setting doesn't work.
> >
> > Thanks,
> > -Richard
> >
> > -----Original Message-----
> > From: Bill Graham [mailto:billgraham@gmail.com]
> > Sent: Thursday, August 05, 2010 5:06 PM
> > To: pig-user@hadoop.apache.org
> > Subject: Re: mapred.min.split.size
> >
> > FYI, Chukwa support for Pig 0.7.0 was just committed last week:
> >
> > https://issues.apache.org/jira/browse/CHUKWA-495
> >
> > The patch was built on Chukwa 0.4.0, but you could try applying the patch
> > against Chukwa 0.3.0. I don't think the relevant code changed much
> between
> > 3-4.
> >
> >
> > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <rd...@yahoo-inc.com>
> wrote:
> >
> >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses
> >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size
> >> property should work.
> >>
> >> But from the release date, Chukwa 0.3 seems not on Pig 0.7.
> >>
> >> Thanks,
> >> -Richard
> >>
> >> -----Original Message-----
> >> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >> Sent: Thursday, August 05, 2010 3:50 PM
> >> To: pig-user@hadoop.apache.org
> >> Subject: Re: mapred.min.split.size
> >>
> >> I am using the ChukwaStorage loader from chukwa 0.3.  Is it the loader's
> >> responsibility to deal with input splits?
> >>
> >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote:
> >>
> >>> I misunderstood your earlier question. If you have one large file, set
> >> mapred.min.split.size property will help to increase the file split
> size.
> >> Pig will pass system properties to Hadoop. What loader are you using?
> >>>
> >>> Thanks,
> >>> -Richard
> >>>
> >>> -----Original Message-----
> >>> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >>> Sent: Thursday, August 05, 2010 1:22 PM
> >>> To: pig-user@hadoop.apache.org
> >>> Subject: Re: mapred.min.split.size
> >>>
> >>> So what does pig do when I have a 5 gig file?  Does it simply hardcode
> >> the split size to block size?   Is there no way to tell it to just
> operate
> >> on a larger split size?
> >>>
> >>>
> >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote:
> >>>
> >>>> For Pig loaders, each split can have at most one file, doesn't matter
> >> what split size is.
> >>>>
> >>>> You can concatenate the input files before loading them.
> >>>>
> >>>> Thanks,
> >>>> -Richard
> >>>> -----Original Message-----
> >>>> From: Corbin Hoenes [mailto:corbin@tynt.com]
> >>>> Sent: Tuesday, July 27, 2010 2:09 PM
> >>>> To: pig-user@hadoop.apache.org
> >>>> Subject: mapred.min.split.size
> >>>>
> >>>> Is there a way to set the mapred.min.split.size property in pig? I set
> >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ
> counter.
> >> My mappers are finishing ~10 secs.  I have ~20,000 of them.
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
>
>