You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by 李钰 <ca...@gmail.com> on 2010/06/23 08:57:20 UTC

Questions about recommendation value of the "io.sort.mb" parameter

Dear all,

Here I've got a question about the "io.sort.mb" parameter. We can find
material from Yahoo! or Cloudera which recommend setting this value to 200
if the job scale is large, but I'm confused about this. As I know,
the tasktracker will launch a child-JVM for each task, and “*io.sort.mb*”
presents the buffer size in memory inside *one map task child-JVM*, the
default value 100MB should be large enough because the input split of one
map task is usually 64MB, as large as the block size we usually set. Then
why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and it
really works)? How could the job size affect the procedure?
Is there any fault here of my understanding? Any comment/suggestion will be
highly valued, thanks in advance.

Best Regards,
Carp

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Yu Li <ca...@gmail.com>.

Hi Todd,

Thanks a lot for your further explanation! It makes me more clear about this
parameter.

BTW, please allow me to express my thankfulness to everyone helps.

Best Regards,
Carp

在 2010年6月24日 上午1:49，Todd Lipcon <to...@cloudera.com>写道：

> Plus there's some overhead for each record of map output. Specifically, 24
> bytes. So if you output 64MB worth of data, but each of your objects is
> only
> 24 bytes long itself, you need more than 128MB worth of spill space for it.
> Last, the map output buffer begins spilling when it is partially full so
> that more records can be collected while spill proceeds.
>
> 200MB io.sort.mb has enough headroom for most 64M input splits that don't
> blow up the data a lot. Expanding much above 200M for most jobs doesn't buy
> you much. Good news is it's easy to tell by looking at the logs to see how
> many times the map tasks are spilling. If you're only spilling once, more
> io.sort.mb will not help.
>
> -Todd
>
> 2010/6/23 李钰 <ca...@gmail.com>
>
> > Hi Jeff,
> >
> > Thanks for your quick reply. Seems my thinking is stuck on the job style
> > I'm
> > running. Now I'm much clearer about it.
> >
> > Best Regards,
> > Carp
> >
> > 2010/6/23 Jeff Zhang <zj...@gmail.com>
> >
> > > Hi 李钰
> > >
> > > The size of map output depends on your Mapper class. The Mapper class
> > > will do processing on the input data.
> > >
> > >
> > >
> > > 2010/6/23 李钰 <ca...@gmail.com>:
> > >  > Hi Sriguru,
> > > >
> > > > Thanks a lot for your comments and suggestions!
> > > > Here I still have some questions: since map mainly do data
> preparation,
> > > > say split input data into KVPs, sort and partition before spill,
> would
> > > the
> > > > size of map output KVPs be much larger than the input data size? If
> > not,
> > > > since one map task deals with one input split, and one input split is
> > > > usually 64M, the map KVPs size would be proximately 64M. Could you
> > please
> > > > give me some example on map output much larger than the input split?
> It
> > > > really confuse me for some time, thanks.
> > > >
> > > > Others,
> > > >
> > > > Also badly need your help if you know about this, thanks.
> > > >
> > > > Best Regards,
> > > > Carp
> > > >
> > > > 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <sriguru@yahoo-inc.com
> > >写道：
> > > >
> > > >> Hi Carp,
> > > >>  Your assumption is right that this is a per-map-task setting.
> > > >> However, this buffer stores map output KVPs, not input. Therefore
> the
> > > >> optimal value depends on how much data your map task is generating.
> > > >>
> > > >> If your output per map is greater than io.sort.mb, these rules of
> > thumb
> > > >> that could work for you:
> > > >>
> > > >> 1) Increase max heap of map tasks to use RAM better, but not hit
> swap.
> > > >> 2) Set io.sort.mb to ~70% of heap.
> > > >>
> > > >> Overall, causing extra "spills" (because of insufficient io.sort.mb)
> > is
> > > >> much better than risking swapping (by setting io.sort.mb and heap
> too
> > > >> large), in terms of relative performance penalty you will pay.
> > > >>
> > > >> Cheers,
> > > >> Sriguru
> > > >>
> > > >> >-----Original Message-----
> > > >> >From: 李钰 [mailto:carp84@gmail.com]
> > > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > > >> >To: common-dev@hadoop.apache.org
> > > >> >Subject: Questions about recommendation value of the "io.sort.mb"
> > > >> >parameter
> > > >> >
> > > >> >Dear all,
> > > >> >
> > > >> >Here I've got a question about the "io.sort.mb" parameter. We can
> > find
> > > >> >material from Yahoo! or Cloudera which recommend setting this value
> > to
> > > >> >200
> > > >> >if the job scale is large, but I'm confused about this. As I know,
> > > >> >the tasktracker will launch a child-JVM for each task, and
> > > >> >“*io.sort.mb*”
> > > >> >presents the buffer size in memory inside *one map task child-JVM*,
> > the
> > > >> >default value 100MB should be large enough because the input split
> of
> > > >> >one
> > > >> >map task is usually 64MB, as large as the block size we usually
> set.
> > > >> >Then
> > > >> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs
> (and
> > > >> >it
> > > >> >really works)? How could the job size affect the procedure?
> > > >> >Is there any fault here of my understanding? Any comment/suggestion
> > > >> >will be
> > > >> >highly valued, thanks in advance.
> > > >> >
> > > >> >Best Regards,
> > > >> >Carp
> > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Yu Li <ca...@gmail.com>.

Hi Todd,

Thanks a lot for your detailed explanation and recommendation, it really
helps a lot!

Best Regards,
Carp

2010/6/26 Todd Lipcon <to...@cloudera.com>

> 2010/6/25 Yu Li <ca...@gmail.com>
>
> > Hi Todd,
> >
> > Sorry for bother again, could you further explain what's the 24 bytes
> > additional overhead for each record of map output? What cost the overhead
> > and what it is for? Thanks a lot.
> >
>
> I actually misremembered, sorry - it's 16 bytes.
>
> In the kvindices buffer:
> 4 bytes for partition ID of each record
> 4 bytes for the key offset in data buffer
> 4 bytes for the value offset in data buffer
>
> In the kvoffsets buffer:
> 4 bytes for an index into the kvindices buffer (this is so that the spill
> sort can just move around indices instead of the entire object)
>
> For more detail, I would recommend reading the code, or looking for Chris
> Douglas's slides from the HUG earlier this year where he gave a very
> informative talk on the evolution of the mapside spill.
>
> -Todd
>
>
> >
> > Best Regards,
> > Carp
> > 在 2010年6月24日 上午1:49，Todd Lipcon <to...@cloudera.com>写道：
> >
> > > Plus there's some overhead for each record of map output. Specifically,
> > 24
> > > bytes. So if you output 64MB worth of data, but each of your objects is
> > > only
> > > 24 bytes long itself, you need more than 128MB worth of spill space for
> > it.
> > > Last, the map output buffer begins spilling when it is partially full
> so
> > > that more records can be collected while spill proceeds.
> > >
> > > 200MB io.sort.mb has enough headroom for most 64M input splits that
> don't
> > > blow up the data a lot. Expanding much above 200M for most jobs doesn't
> > buy
> > > you much. Good news is it's easy to tell by looking at the logs to see
> > how
> > > many times the map tasks are spilling. If you're only spilling once,
> more
> > > io.sort.mb will not help.
> > >
> > > -Todd
> > >
> > > 2010/6/23 李钰 <ca...@gmail.com>
> > >
> > > > Hi Jeff,
> > > >
> > > > Thanks for your quick reply. Seems my thinking is stuck on the job
> > style
> > > > I'm
> > > > running. Now I'm much clearer about it.
> > > >
> > > > Best Regards,
> > > > Carp
> > > >
> > > > 2010/6/23 Jeff Zhang <zj...@gmail.com>
> > > >
> > > > > Hi 李钰
> > > > >
> > > > > The size of map output depends on your Mapper class. The Mapper
> class
> > > > > will do processing on the input data.
> > > > >
> > > > >
> > > > >
> > > > > 2010/6/23 李钰 <ca...@gmail.com>:
> > > > >  > Hi Sriguru,
> > > > > >
> > > > > > Thanks a lot for your comments and suggestions!
> > > > > > Here I still have some questions: since map mainly do data
> > > preparation,
> > > > > > say split input data into KVPs, sort and partition before spill,
> > > would
> > > > > the
> > > > > > size of map output KVPs be much larger than the input data size?
> If
> > > > not,
> > > > > > since one map task deals with one input split, and one input
> split
> > is
> > > > > > usually 64M, the map KVPs size would be proximately 64M. Could
> you
> > > > please
> > > > > > give me some example on map output much larger than the input
> > split?
> > > It
> > > > > > really confuse me for some time, thanks.
> > > > > >
> > > > > > Others,
> > > > > >
> > > > > > Also badly need your help if you know about this, thanks.
> > > > > >
> > > > > > Best Regards,
> > > > > > Carp
> > > > > >
> > > > > > 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <
> > sriguru@yahoo-inc.com
> > > > >写道：
> > > > > >
> > > > > >> Hi Carp,
> > > > > >>  Your assumption is right that this is a per-map-task setting.
> > > > > >> However, this buffer stores map output KVPs, not input.
> Therefore
> > > the
> > > > > >> optimal value depends on how much data your map task is
> > generating.
> > > > > >>
> > > > > >> If your output per map is greater than io.sort.mb, these rules
> of
> > > > thumb
> > > > > >> that could work for you:
> > > > > >>
> > > > > >> 1) Increase max heap of map tasks to use RAM better, but not hit
> > > swap.
> > > > > >> 2) Set io.sort.mb to ~70% of heap.
> > > > > >>
> > > > > >> Overall, causing extra "spills" (because of insufficient
> > io.sort.mb)
> > > > is
> > > > > >> much better than risking swapping (by setting io.sort.mb and
> heap
> > > too
> > > > > >> large), in terms of relative performance penalty you will pay.
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Sriguru
> > > > > >>
> > > > > >> >-----Original Message-----
> > > > > >> >From: 李钰 [mailto:carp84@gmail.com]
> > > > > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > > > > >> >To: common-dev@hadoop.apache.org
> > > > > >> >Subject: Questions about recommendation value of the
> "io.sort.mb"
> > > > > >> >parameter
> > > > > >> >
> > > > > >> >Dear all,
> > > > > >> >
> > > > > >> >Here I've got a question about the "io.sort.mb" parameter. We
> can
> > > > find
> > > > > >> >material from Yahoo! or Cloudera which recommend setting this
> > value
> > > > to
> > > > > >> >200
> > > > > >> >if the job scale is large, but I'm confused about this. As I
> > know,
> > > > > >> >the tasktracker will launch a child-JVM for each task, and
> > > > > >> >“*io.sort.mb*”
> > > > > >> >presents the buffer size in memory inside *one map task
> > child-JVM*,
> > > > the
> > > > > >> >default value 100MB should be large enough because the input
> > split
> > > of
> > > > > >> >one
> > > > > >> >map task is usually 64MB, as large as the block size we usually
> > > set.
> > > > > >> >Then
> > > > > >> >why the recommendation of “*io.sort.mb*” is 200MB for large
> jobs
> > > (and
> > > > > >> >it
> > > > > >> >really works)? How could the job size affect the procedure?
> > > > > >> >Is there any fault here of my understanding? Any
> > comment/suggestion
> > > > > >> >will be
> > > > > >> >highly valued, thanks in advance.
> > > > > >> >
> > > > > >> >Best Regards,
> > > > > >> >Carp
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Jeff Zhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>
>
>
> --
>  Todd Lipcon
> Software Engineer, Cloudera
>

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Todd Lipcon <to...@cloudera.com>.

2010/6/25 Yu Li <ca...@gmail.com>

> Hi Todd,
>
> Sorry for bother again, could you further explain what's the 24 bytes
> additional overhead for each record of map output? What cost the overhead
> and what it is for? Thanks a lot.
>

I actually misremembered, sorry - it's 16 bytes.

In the kvindices buffer:
4 bytes for partition ID of each record
4 bytes for the key offset in data buffer
4 bytes for the value offset in data buffer

In the kvoffsets buffer:
4 bytes for an index into the kvindices buffer (this is so that the spill
sort can just move around indices instead of the entire object)

For more detail, I would recommend reading the code, or looking for Chris
Douglas's slides from the HUG earlier this year where he gave a very
informative talk on the evolution of the mapside spill.

-Todd


>
> Best Regards,
> Carp
> 在 2010年6月24日 上午1:49，Todd Lipcon <to...@cloudera.com>写道：
>
> > Plus there's some overhead for each record of map output. Specifically,
> 24
> > bytes. So if you output 64MB worth of data, but each of your objects is
> > only
> > 24 bytes long itself, you need more than 128MB worth of spill space for
> it.
> > Last, the map output buffer begins spilling when it is partially full so
> > that more records can be collected while spill proceeds.
> >
> > 200MB io.sort.mb has enough headroom for most 64M input splits that don't
> > blow up the data a lot. Expanding much above 200M for most jobs doesn't
> buy
> > you much. Good news is it's easy to tell by looking at the logs to see
> how
> > many times the map tasks are spilling. If you're only spilling once, more
> > io.sort.mb will not help.
> >
> > -Todd
> >
> > 2010/6/23 李钰 <ca...@gmail.com>
> >
> > > Hi Jeff,
> > >
> > > Thanks for your quick reply. Seems my thinking is stuck on the job
> style
> > > I'm
> > > running. Now I'm much clearer about it.
> > >
> > > Best Regards,
> > > Carp
> > >
> > > 2010/6/23 Jeff Zhang <zj...@gmail.com>
> > >
> > > > Hi 李钰
> > > >
> > > > The size of map output depends on your Mapper class. The Mapper class
> > > > will do processing on the input data.
> > > >
> > > >
> > > >
> > > > 2010/6/23 李钰 <ca...@gmail.com>:
> > > >  > Hi Sriguru,
> > > > >
> > > > > Thanks a lot for your comments and suggestions!
> > > > > Here I still have some questions: since map mainly do data
> > preparation,
> > > > > say split input data into KVPs, sort and partition before spill,
> > would
> > > > the
> > > > > size of map output KVPs be much larger than the input data size? If
> > > not,
> > > > > since one map task deals with one input split, and one input split
> is
> > > > > usually 64M, the map KVPs size would be proximately 64M. Could you
> > > please
> > > > > give me some example on map output much larger than the input
> split?
> > It
> > > > > really confuse me for some time, thanks.
> > > > >
> > > > > Others,
> > > > >
> > > > > Also badly need your help if you know about this, thanks.
> > > > >
> > > > > Best Regards,
> > > > > Carp
> > > > >
> > > > > 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <
> sriguru@yahoo-inc.com
> > > >写道：
> > > > >
> > > > >> Hi Carp,
> > > > >>  Your assumption is right that this is a per-map-task setting.
> > > > >> However, this buffer stores map output KVPs, not input. Therefore
> > the
> > > > >> optimal value depends on how much data your map task is
> generating.
> > > > >>
> > > > >> If your output per map is greater than io.sort.mb, these rules of
> > > thumb
> > > > >> that could work for you:
> > > > >>
> > > > >> 1) Increase max heap of map tasks to use RAM better, but not hit
> > swap.
> > > > >> 2) Set io.sort.mb to ~70% of heap.
> > > > >>
> > > > >> Overall, causing extra "spills" (because of insufficient
> io.sort.mb)
> > > is
> > > > >> much better than risking swapping (by setting io.sort.mb and heap
> > too
> > > > >> large), in terms of relative performance penalty you will pay.
> > > > >>
> > > > >> Cheers,
> > > > >> Sriguru
> > > > >>
> > > > >> >-----Original Message-----
> > > > >> >From: 李钰 [mailto:carp84@gmail.com]
> > > > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > > > >> >To: common-dev@hadoop.apache.org
> > > > >> >Subject: Questions about recommendation value of the "io.sort.mb"
> > > > >> >parameter
> > > > >> >
> > > > >> >Dear all,
> > > > >> >
> > > > >> >Here I've got a question about the "io.sort.mb" parameter. We can
> > > find
> > > > >> >material from Yahoo! or Cloudera which recommend setting this
> value
> > > to
> > > > >> >200
> > > > >> >if the job scale is large, but I'm confused about this. As I
> know,
> > > > >> >the tasktracker will launch a child-JVM for each task, and
> > > > >> >“*io.sort.mb*”
> > > > >> >presents the buffer size in memory inside *one map task
> child-JVM*,
> > > the
> > > > >> >default value 100MB should be large enough because the input
> split
> > of
> > > > >> >one
> > > > >> >map task is usually 64MB, as large as the block size we usually
> > set.
> > > > >> >Then
> > > > >> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs
> > (and
> > > > >> >it
> > > > >> >really works)? How could the job size affect the procedure?
> > > > >> >Is there any fault here of my understanding? Any
> comment/suggestion
> > > > >> >will be
> > > > >> >highly valued, thanks in advance.
> > > > >> >
> > > > >> >Best Regards,
> > > > >> >Carp
> > > > >>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Yu Li <ca...@gmail.com>.

Hi Todd,

Sorry for bother again, could you further explain what's the 24 bytes
additional overhead for each record of map output? What cost the overhead
and what it is for? Thanks a lot.

Best Regards,
Carp
在 2010年6月24日 上午1:49，Todd Lipcon <to...@cloudera.com>写道：

> Plus there's some overhead for each record of map output. Specifically, 24
> bytes. So if you output 64MB worth of data, but each of your objects is
> only
> 24 bytes long itself, you need more than 128MB worth of spill space for it.
> Last, the map output buffer begins spilling when it is partially full so
> that more records can be collected while spill proceeds.
>
> 200MB io.sort.mb has enough headroom for most 64M input splits that don't
> blow up the data a lot. Expanding much above 200M for most jobs doesn't buy
> you much. Good news is it's easy to tell by looking at the logs to see how
> many times the map tasks are spilling. If you're only spilling once, more
> io.sort.mb will not help.
>
> -Todd
>
> 2010/6/23 李钰 <ca...@gmail.com>
>
> > Hi Jeff,
> >
> > Thanks for your quick reply. Seems my thinking is stuck on the job style
> > I'm
> > running. Now I'm much clearer about it.
> >
> > Best Regards,
> > Carp
> >
> > 2010/6/23 Jeff Zhang <zj...@gmail.com>
> >
> > > Hi 李钰
> > >
> > > The size of map output depends on your Mapper class. The Mapper class
> > > will do processing on the input data.
> > >
> > >
> > >
> > > 2010/6/23 李钰 <ca...@gmail.com>:
> > >  > Hi Sriguru,
> > > >
> > > > Thanks a lot for your comments and suggestions!
> > > > Here I still have some questions: since map mainly do data
> preparation,
> > > > say split input data into KVPs, sort and partition before spill,
> would
> > > the
> > > > size of map output KVPs be much larger than the input data size? If
> > not,
> > > > since one map task deals with one input split, and one input split is
> > > > usually 64M, the map KVPs size would be proximately 64M. Could you
> > please
> > > > give me some example on map output much larger than the input split?
> It
> > > > really confuse me for some time, thanks.
> > > >
> > > > Others,
> > > >
> > > > Also badly need your help if you know about this, thanks.
> > > >
> > > > Best Regards,
> > > > Carp
> > > >
> > > > 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <sriguru@yahoo-inc.com
> > >写道：
> > > >
> > > >> Hi Carp,
> > > >>  Your assumption is right that this is a per-map-task setting.
> > > >> However, this buffer stores map output KVPs, not input. Therefore
> the
> > > >> optimal value depends on how much data your map task is generating.
> > > >>
> > > >> If your output per map is greater than io.sort.mb, these rules of
> > thumb
> > > >> that could work for you:
> > > >>
> > > >> 1) Increase max heap of map tasks to use RAM better, but not hit
> swap.
> > > >> 2) Set io.sort.mb to ~70% of heap.
> > > >>
> > > >> Overall, causing extra "spills" (because of insufficient io.sort.mb)
> > is
> > > >> much better than risking swapping (by setting io.sort.mb and heap
> too
> > > >> large), in terms of relative performance penalty you will pay.
> > > >>
> > > >> Cheers,
> > > >> Sriguru
> > > >>
> > > >> >-----Original Message-----
> > > >> >From: 李钰 [mailto:carp84@gmail.com]
> > > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > > >> >To: common-dev@hadoop.apache.org
> > > >> >Subject: Questions about recommendation value of the "io.sort.mb"
> > > >> >parameter
> > > >> >
> > > >> >Dear all,
> > > >> >
> > > >> >Here I've got a question about the "io.sort.mb" parameter. We can
> > find
> > > >> >material from Yahoo! or Cloudera which recommend setting this value
> > to
> > > >> >200
> > > >> >if the job scale is large, but I'm confused about this. As I know,
> > > >> >the tasktracker will launch a child-JVM for each task, and
> > > >> >“*io.sort.mb*”
> > > >> >presents the buffer size in memory inside *one map task child-JVM*,
> > the
> > > >> >default value 100MB should be large enough because the input split
> of
> > > >> >one
> > > >> >map task is usually 64MB, as large as the block size we usually
> set.
> > > >> >Then
> > > >> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs
> (and
> > > >> >it
> > > >> >really works)? How could the job size affect the procedure?
> > > >> >Is there any fault here of my understanding? Any comment/suggestion
> > > >> >will be
> > > >> >highly valued, thanks in advance.
> > > >> >
> > > >> >Best Regards,
> > > >> >Carp
> > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Todd Lipcon <to...@cloudera.com>.

Plus there's some overhead for each record of map output. Specifically, 24
bytes. So if you output 64MB worth of data, but each of your objects is only
24 bytes long itself, you need more than 128MB worth of spill space for it.
Last, the map output buffer begins spilling when it is partially full so
that more records can be collected while spill proceeds.

200MB io.sort.mb has enough headroom for most 64M input splits that don't
blow up the data a lot. Expanding much above 200M for most jobs doesn't buy
you much. Good news is it's easy to tell by looking at the logs to see how
many times the map tasks are spilling. If you're only spilling once, more
io.sort.mb will not help.

-Todd

2010/6/23 李钰 <ca...@gmail.com>

> Hi Jeff,
>
> Thanks for your quick reply. Seems my thinking is stuck on the job style
> I'm
> running. Now I'm much clearer about it.
>
> Best Regards,
> Carp
>
> 2010/6/23 Jeff Zhang <zj...@gmail.com>
>
> > Hi 李钰
> >
> > The size of map output depends on your Mapper class. The Mapper class
> > will do processing on the input data.
> >
> >
> >
> > 2010/6/23 李钰 <ca...@gmail.com>:
> >  > Hi Sriguru,
> > >
> > > Thanks a lot for your comments and suggestions!
> > > Here I still have some questions: since map mainly do data preparation,
> > > say split input data into KVPs, sort and partition before spill, would
> > the
> > > size of map output KVPs be much larger than the input data size? If
> not,
> > > since one map task deals with one input split, and one input split is
> > > usually 64M, the map KVPs size would be proximately 64M. Could you
> please
> > > give me some example on map output much larger than the input split? It
> > > really confuse me for some time, thanks.
> > >
> > > Others,
> > >
> > > Also badly need your help if you know about this, thanks.
> > >
> > > Best Regards,
> > > Carp
> > >
> > > 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <sriguru@yahoo-inc.com
> >写道：
> > >
> > >> Hi Carp,
> > >>  Your assumption is right that this is a per-map-task setting.
> > >> However, this buffer stores map output KVPs, not input. Therefore the
> > >> optimal value depends on how much data your map task is generating.
> > >>
> > >> If your output per map is greater than io.sort.mb, these rules of
> thumb
> > >> that could work for you:
> > >>
> > >> 1) Increase max heap of map tasks to use RAM better, but not hit swap.
> > >> 2) Set io.sort.mb to ~70% of heap.
> > >>
> > >> Overall, causing extra "spills" (because of insufficient io.sort.mb)
> is
> > >> much better than risking swapping (by setting io.sort.mb and heap too
> > >> large), in terms of relative performance penalty you will pay.
> > >>
> > >> Cheers,
> > >> Sriguru
> > >>
> > >> >-----Original Message-----
> > >> >From: 李钰 [mailto:carp84@gmail.com]
> > >> >Sent: Wednesday, June 23, 2010 12:27 PM
> > >> >To: common-dev@hadoop.apache.org
> > >> >Subject: Questions about recommendation value of the "io.sort.mb"
> > >> >parameter
> > >> >
> > >> >Dear all,
> > >> >
> > >> >Here I've got a question about the "io.sort.mb" parameter. We can
> find
> > >> >material from Yahoo! or Cloudera which recommend setting this value
> to
> > >> >200
> > >> >if the job scale is large, but I'm confused about this. As I know,
> > >> >the tasktracker will launch a child-JVM for each task, and
> > >> >“*io.sort.mb*”
> > >> >presents the buffer size in memory inside *one map task child-JVM*,
> the
> > >> >default value 100MB should be large enough because the input split of
> > >> >one
> > >> >map task is usually 64MB, as large as the block size we usually set.
> > >> >Then
> > >> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and
> > >> >it
> > >> >really works)? How could the job size affect the procedure?
> > >> >Is there any fault here of my understanding? Any comment/suggestion
> > >> >will be
> > >> >highly valued, thanks in advance.
> > >> >
> > >> >Best Regards,
> > >> >Carp
> > >>
> > >
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by 李钰 <ca...@gmail.com>.

Hi Jeff,

Thanks for your quick reply. Seems my thinking is stuck on the job style I'm
running. Now I'm much clearer about it.

Best Regards,
Carp

2010/6/23 Jeff Zhang <zj...@gmail.com>

> Hi 李钰
>
> The size of map output depends on your Mapper class. The Mapper class
> will do processing on the input data.
>
>
>
> 2010/6/23 李钰 <ca...@gmail.com>:
>  > Hi Sriguru,
> >
> > Thanks a lot for your comments and suggestions!
> > Here I still have some questions: since map mainly do data preparation,
> > say split input data into KVPs, sort and partition before spill, would
> the
> > size of map output KVPs be much larger than the input data size? If not,
> > since one map task deals with one input split, and one input split is
> > usually 64M, the map KVPs size would be proximately 64M. Could you please
> > give me some example on map output much larger than the input split? It
> > really confuse me for some time, thanks.
> >
> > Others,
> >
> > Also badly need your help if you know about this, thanks.
> >
> > Best Regards,
> > Carp
> >
> > 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <sr...@yahoo-inc.com>写道：
> >
> >> Hi Carp,
> >>  Your assumption is right that this is a per-map-task setting.
> >> However, this buffer stores map output KVPs, not input. Therefore the
> >> optimal value depends on how much data your map task is generating.
> >>
> >> If your output per map is greater than io.sort.mb, these rules of thumb
> >> that could work for you:
> >>
> >> 1) Increase max heap of map tasks to use RAM better, but not hit swap.
> >> 2) Set io.sort.mb to ~70% of heap.
> >>
> >> Overall, causing extra "spills" (because of insufficient io.sort.mb) is
> >> much better than risking swapping (by setting io.sort.mb and heap too
> >> large), in terms of relative performance penalty you will pay.
> >>
> >> Cheers,
> >> Sriguru
> >>
> >> >-----Original Message-----
> >> >From: 李钰 [mailto:carp84@gmail.com]
> >> >Sent: Wednesday, June 23, 2010 12:27 PM
> >> >To: common-dev@hadoop.apache.org
> >> >Subject: Questions about recommendation value of the "io.sort.mb"
> >> >parameter
> >> >
> >> >Dear all,
> >> >
> >> >Here I've got a question about the "io.sort.mb" parameter. We can find
> >> >material from Yahoo! or Cloudera which recommend setting this value to
> >> >200
> >> >if the job scale is large, but I'm confused about this. As I know,
> >> >the tasktracker will launch a child-JVM for each task, and
> >> >“*io.sort.mb*”
> >> >presents the buffer size in memory inside *one map task child-JVM*, the
> >> >default value 100MB should be large enough because the input split of
> >> >one
> >> >map task is usually 64MB, as large as the block size we usually set.
> >> >Then
> >> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and
> >> >it
> >> >really works)? How could the job size affect the procedure?
> >> >Is there any fault here of my understanding? Any comment/suggestion
> >> >will be
> >> >highly valued, thanks in advance.
> >> >
> >> >Best Regards,
> >> >Carp
> >>
> >
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Jeff Zhang <zj...@gmail.com>.

Hi 李钰

The size of map output depends on your Mapper class. The Mapper class
will do processing on the input data.



2010/6/23 李钰 <ca...@gmail.com>:
> Hi Sriguru,
>
> Thanks a lot for your comments and suggestions!
> Here I still have some questions: since map mainly do data preparation,
> say split input data into KVPs, sort and partition before spill, would the
> size of map output KVPs be much larger than the input data size? If not,
> since one map task deals with one input split, and one input split is
> usually 64M, the map KVPs size would be proximately 64M. Could you please
> give me some example on map output much larger than the input split? It
> really confuse me for some time, thanks.
>
> Others,
>
> Also badly need your help if you know about this, thanks.
>
> Best Regards,
> Carp
>
> 在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <sr...@yahoo-inc.com>写道：
>
>> Hi Carp,
>>  Your assumption is right that this is a per-map-task setting.
>> However, this buffer stores map output KVPs, not input. Therefore the
>> optimal value depends on how much data your map task is generating.
>>
>> If your output per map is greater than io.sort.mb, these rules of thumb
>> that could work for you:
>>
>> 1) Increase max heap of map tasks to use RAM better, but not hit swap.
>> 2) Set io.sort.mb to ~70% of heap.
>>
>> Overall, causing extra "spills" (because of insufficient io.sort.mb) is
>> much better than risking swapping (by setting io.sort.mb and heap too
>> large), in terms of relative performance penalty you will pay.
>>
>> Cheers,
>> Sriguru
>>
>> >-----Original Message-----
>> >From: 李钰 [mailto:carp84@gmail.com]
>> >Sent: Wednesday, June 23, 2010 12:27 PM
>> >To: common-dev@hadoop.apache.org
>> >Subject: Questions about recommendation value of the "io.sort.mb"
>> >parameter
>> >
>> >Dear all,
>> >
>> >Here I've got a question about the "io.sort.mb" parameter. We can find
>> >material from Yahoo! or Cloudera which recommend setting this value to
>> >200
>> >if the job scale is large, but I'm confused about this. As I know,
>> >the tasktracker will launch a child-JVM for each task, and
>> >“*io.sort.mb*”
>> >presents the buffer size in memory inside *one map task child-JVM*, the
>> >default value 100MB should be large enough because the input split of
>> >one
>> >map task is usually 64MB, as large as the block size we usually set.
>> >Then
>> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and
>> >it
>> >really works)? How could the job size affect the procedure?
>> >Is there any fault here of my understanding? Any comment/suggestion
>> >will be
>> >highly valued, thanks in advance.
>> >
>> >Best Regards,
>> >Carp
>>
>



-- 
Best Regards

Jeff Zhang

Re: Questions about recommendation value of the "io.sort.mb" parameter

Posted by 李钰 <ca...@gmail.com>.

Hi Sriguru,

Thanks a lot for your comments and suggestions!
Here I still have some questions: since map mainly do data preparation,
say split input data into KVPs, sort and partition before spill, would the
size of map output KVPs be much larger than the input data size? If not,
since one map task deals with one input split, and one input split is
usually 64M, the map KVPs size would be proximately 64M. Could you please
give me some example on map output much larger than the input split? It
really confuse me for some time, thanks.

Others,

Also badly need your help if you know about this, thanks.

Best Regards,
Carp

在 2010年6月23日 下午5:11，Srigurunath Chakravarthi <sr...@yahoo-inc.com>写道：

> Hi Carp,
>  Your assumption is right that this is a per-map-task setting.
> However, this buffer stores map output KVPs, not input. Therefore the
> optimal value depends on how much data your map task is generating.
>
> If your output per map is greater than io.sort.mb, these rules of thumb
> that could work for you:
>
> 1) Increase max heap of map tasks to use RAM better, but not hit swap.
> 2) Set io.sort.mb to ~70% of heap.
>
> Overall, causing extra "spills" (because of insufficient io.sort.mb) is
> much better than risking swapping (by setting io.sort.mb and heap too
> large), in terms of relative performance penalty you will pay.
>
> Cheers,
> Sriguru
>
> >-----Original Message-----
> >From: 李钰 [mailto:carp84@gmail.com]
> >Sent: Wednesday, June 23, 2010 12:27 PM
> >To: common-dev@hadoop.apache.org
> >Subject: Questions about recommendation value of the "io.sort.mb"
> >parameter
> >
> >Dear all,
> >
> >Here I've got a question about the "io.sort.mb" parameter. We can find
> >material from Yahoo! or Cloudera which recommend setting this value to
> >200
> >if the job scale is large, but I'm confused about this. As I know,
> >the tasktracker will launch a child-JVM for each task, and
> >“*io.sort.mb*”
> >presents the buffer size in memory inside *one map task child-JVM*, the
> >default value 100MB should be large enough because the input split of
> >one
> >map task is usually 64MB, as large as the block size we usually set.
> >Then
> >why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and
> >it
> >really works)? How could the job size affect the procedure?
> >Is there any fault here of my understanding? Any comment/suggestion
> >will be
> >highly valued, thanks in advance.
> >
> >Best Regards,
> >Carp
>

RE: Questions about recommendation value of the "io.sort.mb" parameter

Posted by Srigurunath Chakravarthi <sr...@yahoo-inc.com>.

Hi Carp,
 Your assumption is right that this is a per-map-task setting.
However, this buffer stores map output KVPs, not input. Therefore the optimal value depends on how much data your map task is generating.

If your output per map is greater than io.sort.mb, these rules of thumb that could work for you:

1) Increase max heap of map tasks to use RAM better, but not hit swap.
2) Set io.sort.mb to ~70% of heap.

Overall, causing extra "spills" (because of insufficient io.sort.mb) is much better than risking swapping (by setting io.sort.mb and heap too large), in terms of relative performance penalty you will pay.

Cheers,
Sriguru

>-----Original Message-----
>From: 李钰 [mailto:carp84@gmail.com]
>Sent: Wednesday, June 23, 2010 12:27 PM
>To: common-dev@hadoop.apache.org
>Subject: Questions about recommendation value of the "io.sort.mb"
>parameter
>
>Dear all,
>
>Here I've got a question about the "io.sort.mb" parameter. We can find
>material from Yahoo! or Cloudera which recommend setting this value to
>200
>if the job scale is large, but I'm confused about this. As I know,
>the tasktracker will launch a child-JVM for each task, and
>“*io.sort.mb*”
>presents the buffer size in memory inside *one map task child-JVM*, the
>default value 100MB should be large enough because the input split of
>one
>map task is usually 64MB, as large as the block size we usually set.
>Then
>why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and
>it
>really works)? How could the job size affect the procedure?
>Is there any fault here of my understanding? Any comment/suggestion
>will be
>highly valued, thanks in advance.
>
>Best Regards,
>Carp