You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ajay Srivastava <Aj...@guavus.com> on 2013/01/23 07:23:07 UTC

io.sort.factor

Hi,

io.sort.factor  --  The number of streams to merge at once while sorting files. This determines the number of open file handles.


How can I use this parameter to improve performance of mapreduce job?
My understanding from above description was If there are many spill records then increasing io.sort.mb as well as io.sort.factor will help in better performance. Increasing io.sort.mb helped but changing io.sort.factor (> 10) does not seem to improve/degrade performance of mapred  job.



Regards,
Ajay Srivastava

Re: io.sort.factor

Posted by Ajay Srivastava <Aj...@guavus.com>.
Hi Bharat,

I am looking at these logs -
2013-01-22 07:35:42,923 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2

The number at the end of string does not go beyond 6. So I assume you are correct.


Regards,
Ajay Srivastava


On 23-Jan-2013, at 12:14 PM, bharath vissapragada wrote:

Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk flushes as more data is spilled at once and this boosts the performance. This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less than "io.sort.factor" and all the spills are getting merged in a single go (per partition). So even if you increase the "io.sort.factor" value now, it shouldn't make any difference. So check the logs and see the no. of spills created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <Aj...@guavus.com>> wrote:
Hi,

io.sort.factor  --  The number of streams to merge at once while sorting files. This determines the number of open file handles.


How can I use this parameter to improve performance of mapreduce job?
My understanding from above description was If there are many spill records then increasing io.sort.mb as well as io.sort.factor will help in better performance. Increasing io.sort.mb helped but changing io.sort.factor (> 10) does not seem to improve/degrade performance of mapred  job.



Regards,
Ajay Srivastava



Re: io.sort.factor

Posted by Ajay Srivastava <Aj...@guavus.com>.
Hi Bharat,

I am looking at these logs -
2013-01-22 07:35:42,923 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2

The number at the end of string does not go beyond 6. So I assume you are correct.


Regards,
Ajay Srivastava


On 23-Jan-2013, at 12:14 PM, bharath vissapragada wrote:

Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk flushes as more data is spilled at once and this boosts the performance. This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less than "io.sort.factor" and all the spills are getting merged in a single go (per partition). So even if you increase the "io.sort.factor" value now, it shouldn't make any difference. So check the logs and see the no. of spills created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <Aj...@guavus.com>> wrote:
Hi,

io.sort.factor  --  The number of streams to merge at once while sorting files. This determines the number of open file handles.


How can I use this parameter to improve performance of mapreduce job?
My understanding from above description was If there are many spill records then increasing io.sort.mb as well as io.sort.factor will help in better performance. Increasing io.sort.mb helped but changing io.sort.factor (> 10) does not seem to improve/degrade performance of mapred  job.



Regards,
Ajay Srivastava



Re: io.sort.factor

Posted by Ajay Srivastava <Aj...@guavus.com>.
Hi Bharat,

I am looking at these logs -
2013-01-22 07:35:42,923 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2

The number at the end of string does not go beyond 6. So I assume you are correct.


Regards,
Ajay Srivastava


On 23-Jan-2013, at 12:14 PM, bharath vissapragada wrote:

Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk flushes as more data is spilled at once and this boosts the performance. This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less than "io.sort.factor" and all the spills are getting merged in a single go (per partition). So even if you increase the "io.sort.factor" value now, it shouldn't make any difference. So check the logs and see the no. of spills created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <Aj...@guavus.com>> wrote:
Hi,

io.sort.factor  --  The number of streams to merge at once while sorting files. This determines the number of open file handles.


How can I use this parameter to improve performance of mapreduce job?
My understanding from above description was If there are many spill records then increasing io.sort.mb as well as io.sort.factor will help in better performance. Increasing io.sort.mb helped but changing io.sort.factor (> 10) does not seem to improve/degrade performance of mapred  job.



Regards,
Ajay Srivastava



Re: io.sort.factor

Posted by Ajay Srivastava <Aj...@guavus.com>.
Hi Bharat,

I am looking at these logs -
2013-01-22 07:35:42,923 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2

The number at the end of string does not go beyond 6. So I assume you are correct.


Regards,
Ajay Srivastava


On 23-Jan-2013, at 12:14 PM, bharath vissapragada wrote:

Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk flushes as more data is spilled at once and this boosts the performance. This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less than "io.sort.factor" and all the spills are getting merged in a single go (per partition). So even if you increase the "io.sort.factor" value now, it shouldn't make any difference. So check the logs and see the no. of spills created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <Aj...@guavus.com>> wrote:
Hi,

io.sort.factor  --  The number of streams to merge at once while sorting files. This determines the number of open file handles.


How can I use this parameter to improve performance of mapreduce job?
My understanding from above description was If there are many spill records then increasing io.sort.mb as well as io.sort.factor will help in better performance. Increasing io.sort.mb helped but changing io.sort.factor (> 10) does not seem to improve/degrade performance of mapred  job.



Regards,
Ajay Srivastava



Re: io.sort.factor

Posted by bharath vissapragada <bh...@gmail.com>.
Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk
flushes as more data is spilled at once and this boosts the performance.
This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less
than "io.sort.factor" and all the spills are getting merged in a single go
(per partition). So even if you increase the "io.sort.factor" value now, it
shouldn't make any difference. So check the logs and see the no. of spills
created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <
Ajay.Srivastava@guavus.com> wrote:

> Hi,
>
> io.sort.factor  --  The number of streams to merge at once while sorting
> files. This determines the number of open file handles.
>
>
> How can I use this parameter to improve performance of mapreduce job?
> My understanding from above description was If there are many spill
> records then increasing io.sort.mb as well as io.sort.factor will help in
> better performance. Increasing io.sort.mb helped but changing
> io.sort.factor (> 10) does not seem to improve/degrade performance of
> mapred  job.
>
>
>
> Regards,
> Ajay Srivastava

Re: io.sort.factor

Posted by bharath vissapragada <bh...@gmail.com>.
Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk
flushes as more data is spilled at once and this boosts the performance.
This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less
than "io.sort.factor" and all the spills are getting merged in a single go
(per partition). So even if you increase the "io.sort.factor" value now, it
shouldn't make any difference. So check the logs and see the no. of spills
created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <
Ajay.Srivastava@guavus.com> wrote:

> Hi,
>
> io.sort.factor  --  The number of streams to merge at once while sorting
> files. This determines the number of open file handles.
>
>
> How can I use this parameter to improve performance of mapreduce job?
> My understanding from above description was If there are many spill
> records then increasing io.sort.mb as well as io.sort.factor will help in
> better performance. Increasing io.sort.mb helped but changing
> io.sort.factor (> 10) does not seem to improve/degrade performance of
> mapred  job.
>
>
>
> Regards,
> Ajay Srivastava

Re: io.sort.factor

Posted by bharath vissapragada <bh...@gmail.com>.
Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk
flushes as more data is spilled at once and this boosts the performance.
This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less
than "io.sort.factor" and all the spills are getting merged in a single go
(per partition). So even if you increase the "io.sort.factor" value now, it
shouldn't make any difference. So check the logs and see the no. of spills
created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <
Ajay.Srivastava@guavus.com> wrote:

> Hi,
>
> io.sort.factor  --  The number of streams to merge at once while sorting
> files. This determines the number of open file handles.
>
>
> How can I use this parameter to improve performance of mapreduce job?
> My understanding from above description was If there are many spill
> records then increasing io.sort.mb as well as io.sort.factor will help in
> better performance. Increasing io.sort.mb helped but changing
> io.sort.factor (> 10) does not seem to improve/degrade performance of
> mapred  job.
>
>
>
> Regards,
> Ajay Srivastava

Re: io.sort.factor

Posted by bharath vissapragada <bh...@gmail.com>.
Hi,

>From my understanding, increasing "io.sort.mb" decreases the number of disk
flushes as more data is spilled at once and this boosts the performance.
This obviously decreases the number of spills.

One possibility in this case is that, the number of spills have become less
than "io.sort.factor" and all the spills are getting merged in a single go
(per partition). So even if you increase the "io.sort.factor" value now, it
shouldn't make any difference. So check the logs and see the no. of spills
created and if its less than "io.sort.factor".

This is a wild guess, please feel free to correct me if I'm wrong.

Thanks



On Wed, Jan 23, 2013 at 11:53 AM, Ajay Srivastava <
Ajay.Srivastava@guavus.com> wrote:

> Hi,
>
> io.sort.factor  --  The number of streams to merge at once while sorting
> files. This determines the number of open file handles.
>
>
> How can I use this parameter to improve performance of mapreduce job?
> My understanding from above description was If there are many spill
> records then increasing io.sort.mb as well as io.sort.factor will help in
> better performance. Increasing io.sort.mb helped but changing
> io.sort.factor (> 10) does not seem to improve/degrade performance of
> mapred  job.
>
>
>
> Regards,
> Ajay Srivastava