You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Olimpiati <ma...@gmail.com> on 2012/10/26 22:47:46 UTC

Maps split size

Hi,

  I've found that the solution to control the split size per mapper is to
modify the following configurations:

mapred.min.split.size and mapred.max.split.size, but when I set them both
to 14MB with dfs.block.size = 64MB, the splits are still = 64MB.

So, is there a relation between them that I should consider?

Thank you,
Mark

Re: Maps split size

Posted by Bertrand Dechoux <de...@gmail.com>.
Okay, then it would be because you didn't really change the block size.
Of course, you might change the value of the property but the block size is
actually something which is part of the file definition. It was stored as
blocks of 64MB (the default) and so you can only read it as blocks of 64MB.
If you want to change the block size of a file, you will have to recreate
it eg by copying it.

Regards

Bertrand

On Mon, Oct 29, 2012 at 5:25 AM, Mark Olimpiati <ma...@gmail.com> wrote:

> Well, when I said I found a solution this link was one of them :). Even
> though I set :
>
> dfs.block.size = mapred.min.split.size = mapred.max.split.size = 14MB the
> job is still running maps with 64MB !
>
> I don't see what else can I change :(
>
> Thanks,
> Mark
>
> On Fri, Oct 26, 2012 at 2:23 PM, Bertrand Dechoux <dechouxb@gmail.com
> >wrote:
>
> > Hi Mark,
> >
> > I think http://wiki.apache.org/hadoop/HowManyMapsAndReduces might
> interest
> > you.
> > If you require more information, feel free to ask after reading it.
> >
> > Regards
> >
> > Bertrand
> >
> > On Fri, Oct 26, 2012 at 10:47 PM, Mark Olimpiati <markq2011@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > >   I've found that the solution to control the split size per mapper is
> to
> > > modify the following configurations:
> > >
> > > mapred.min.split.size and mapred.max.split.size, but when I set them
> both
> > > to 14MB with dfs.block.size = 64MB, the splits are still = 64MB.
> > >
> > > So, is there a relation between them that I should consider?
> > >
> > > Thank you,
> > > Mark
> > >
> >
> >
> >
> > --
> > Bertrand Dechoux
> >
>



-- 
Bertrand Dechoux

Re: Maps split size

Posted by Mark Olimpiati <ma...@gmail.com>.
Well, when I said I found a solution this link was one of them :). Even
though I set :

dfs.block.size = mapred.min.split.size = mapred.max.split.size = 14MB the
job is still running maps with 64MB !

I don't see what else can I change :(

Thanks,
Mark

On Fri, Oct 26, 2012 at 2:23 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Hi Mark,
>
> I think http://wiki.apache.org/hadoop/HowManyMapsAndReduces might interest
> you.
> If you require more information, feel free to ask after reading it.
>
> Regards
>
> Bertrand
>
> On Fri, Oct 26, 2012 at 10:47 PM, Mark Olimpiati <markq2011@gmail.com
> >wrote:
>
> > Hi,
> >
> >   I've found that the solution to control the split size per mapper is to
> > modify the following configurations:
> >
> > mapred.min.split.size and mapred.max.split.size, but when I set them both
> > to 14MB with dfs.block.size = 64MB, the splits are still = 64MB.
> >
> > So, is there a relation between them that I should consider?
> >
> > Thank you,
> > Mark
> >
>
>
>
> --
> Bertrand Dechoux
>

Re: Maps split size

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi Mark,

I think http://wiki.apache.org/hadoop/HowManyMapsAndReduces might interest
you.
If you require more information, feel free to ask after reading it.

Regards

Bertrand

On Fri, Oct 26, 2012 at 10:47 PM, Mark Olimpiati <ma...@gmail.com>wrote:

> Hi,
>
>   I've found that the solution to control the split size per mapper is to
> modify the following configurations:
>
> mapred.min.split.size and mapred.max.split.size, but when I set them both
> to 14MB with dfs.block.size = 64MB, the splits are still = 64MB.
>
> So, is there a relation between them that I should consider?
>
> Thank you,
> Mark
>



-- 
Bertrand Dechoux