You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Renato Moutinho <re...@gmail.com> on 2014/10/03 23:40:16 UTC

Reduce phase of wordcount

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount
sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
configuration I´ve been able to make the map phase run on 22 minutes..
However the reduce phase (which consists of a single job) stucks at some
points making the whole job take more than 40 minutes. Looking at the logs,
I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me
think it´s having difficulty transferring the parts from the map nodes. Is
my interpretation correct on this ? The trasnfer rate is waaay too slow if
compared to scp file transfer between the hosts (10 times slower). Any
takes on why ?

Regards,

Renato Moutinho

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Nice !

mapred.reduce.tasks affects the job (the group of tasks) so it should be 
at least equal to mapred.tasktracker.reduce.tasks.maximum * <number of 
nodes>
With your setup you allow each of your 7 tasktrackers to launch 8 
reducers (that would be 56) but you limit the total number of reducers 
at 7...

Combiners are very effective to limit shuffle overhead and for a job as 
wordcount you can just use the reduce class.
Just add something like job.setCombinerClass(MyReducer.class);
to your driver and you're good

Ulul

Le 06/10/2014 21:18, Renato Moutinho a écrit :
> Hi folks,
>
>     just as a feeback: increasing 
> mapred.tasktracker.reduce.tasks.maximum had no effect (it was already 
> set to 8) and the job created only 1 reducer (my original scenario). 
> However, adding mapred.reduce.tasks and setting to some higher than 1 
> value (I´ve set to 7) made hadoop spawn that much reduce tasks (seven 
> on my example) and the execution time went down to around 29 minutes 
> (also, my servers are now frying cpu....) ! My next step (I´m pushing 
> it to the maximum) is adding a combiner..
>
> And no... I haven´t setup this cluster just for running 
> wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)
>
> Thanks a lot for your help !
>
> Regards,
>
> Renato Moutinho
>
> 2014-10-05 18:53 GMT-03:00 Ulul <hadoop@ulul.org 
> <ma...@ulul.org>>:
>
>     Hi
>
>     You indicate that you have just one reducer, which is the default
>     in Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
>     You should increase mapred.reduce.tasks use combiners and maybe
>     tune mapred.reduce.tasktracker.reduce.tasks.maximum
>
>     Hope that helps
>     Ulul
>
>     Le 05/10/2014 16:53, Renato Moutinho a écrit :
>>     Hi there,
>>
>>          thanks a lot for taking the time to answer me ! Actually,
>>     this "issue" happens after all the map tasks have completed (I'm
>>     looking at the web interface). I'll try to diagnose if it's an
>>     issue with the number of threads.. I suppose I'll have to change
>>     the logging configuration to find what's going on..
>>
>>     The only that's getting to me is the fact that the lines are
>>     repeated on the log..
>>
>>     Regards,
>>
>>     Renato Moutinho
>>
>>
>>
>>     Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com
>>     <ma...@hotmail.com>> escreveu:
>>
>>>     Don't be confused by 6.03 MB/s.
>>>
>>>     The relationship between mapper and reducer is M to N
>>>     relationship, which means the mapper could send its data to all
>>>     reducers, and one reducer could receive its input from all mappers.
>>>
>>>     There could be a lot of reasons why you think the reduce copying
>>>     phase is too slow. It could be the mappers are still running,
>>>     there is no data generated for reducer to copy yet; or there is
>>>     no enough threads in either mapper or reducer to utilize
>>>     remaining cpu/memory/network bandwidth. You can google the
>>>     hadoop configurations to adjust them.
>>>
>>>     But just because you can get 60M/s in scp, then complain only
>>>     getting 6M/s in the log is not fair to hadoop. You one reducer
>>>     needs to copy data from all the mappers, concurrently, makes it
>>>     impossible to reach the same speed as one to one point network
>>>     transfer speed.
>>>
>>>     The reducer stage is normally longer than map stage, as data HAS
>>>     to be transferred through network.
>>>
>>>     But in word count example, the data needs to be transferred
>>>     should be very small. You can ask the following question by
>>>     yourself:
>>>
>>>     1) Should I use combiner in this case? (Yes, for word count, it
>>>     reduces the data needs to be transferred).
>>>     2) Do I use all the reducers I can use, if my cluster is under
>>>     utilized and I want my job to finish fast?
>>>     3) Can I add more threads in the task tracker to help? You need
>>>     to dig into your log to find out if your mapper or reducer are
>>>     waiting for the thread from thread pool.
>>>
>>>     Yong
>>>
>>>     ------------------------------------------------------------------------
>>>     Date: Fri, 3 Oct 2014 18:40:16 -0300
>>>     Subject: Reduce phase of wordcount
>>>     From: renato.moutinho@gmail.com <ma...@gmail.com>
>>>     To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>>
>>>     Hi people,
>>>
>>>         I´m doing some experiments with hadoop 1.2.1 running the
>>>     wordcount sample on an 8 nodes cluster (master + 7 slaves).
>>>     Tuning the tasks configuration I´ve been able to make the map
>>>     phase run on 22 minutes.. However the reduce phase (which
>>>     consists of a single job) stucks at some points making the whole
>>>     job take more than 40 minutes. Looking at the logs, I´ve seen
>>>     several lines stuck at copy on different moments, like this:
>>>
>>>     2014-10-03 18:26:34,717 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:37,736 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:40,754 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:43,772 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>
>>>     Eventually the job end, but this information, being repeated,
>>>     makes me think it´s having difficulty transferring the parts
>>>     from the map nodes. Is my interpretation correct on this ? The
>>>     trasnfer rate is waaay too slow if compared to scp file transfer
>>>     between the hosts (10 times slower). Any takes on why ?
>>>
>>>     Regards,
>>>
>>>     Renato Moutinho
>
>

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Nice !

mapred.reduce.tasks affects the job (the group of tasks) so it should be 
at least equal to mapred.tasktracker.reduce.tasks.maximum * <number of 
nodes>
With your setup you allow each of your 7 tasktrackers to launch 8 
reducers (that would be 56) but you limit the total number of reducers 
at 7...

Combiners are very effective to limit shuffle overhead and for a job as 
wordcount you can just use the reduce class.
Just add something like job.setCombinerClass(MyReducer.class);
to your driver and you're good

Ulul

Le 06/10/2014 21:18, Renato Moutinho a écrit :
> Hi folks,
>
>     just as a feeback: increasing 
> mapred.tasktracker.reduce.tasks.maximum had no effect (it was already 
> set to 8) and the job created only 1 reducer (my original scenario). 
> However, adding mapred.reduce.tasks and setting to some higher than 1 
> value (I´ve set to 7) made hadoop spawn that much reduce tasks (seven 
> on my example) and the execution time went down to around 29 minutes 
> (also, my servers are now frying cpu....) ! My next step (I´m pushing 
> it to the maximum) is adding a combiner..
>
> And no... I haven´t setup this cluster just for running 
> wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)
>
> Thanks a lot for your help !
>
> Regards,
>
> Renato Moutinho
>
> 2014-10-05 18:53 GMT-03:00 Ulul <hadoop@ulul.org 
> <ma...@ulul.org>>:
>
>     Hi
>
>     You indicate that you have just one reducer, which is the default
>     in Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
>     You should increase mapred.reduce.tasks use combiners and maybe
>     tune mapred.reduce.tasktracker.reduce.tasks.maximum
>
>     Hope that helps
>     Ulul
>
>     Le 05/10/2014 16:53, Renato Moutinho a écrit :
>>     Hi there,
>>
>>          thanks a lot for taking the time to answer me ! Actually,
>>     this "issue" happens after all the map tasks have completed (I'm
>>     looking at the web interface). I'll try to diagnose if it's an
>>     issue with the number of threads.. I suppose I'll have to change
>>     the logging configuration to find what's going on..
>>
>>     The only that's getting to me is the fact that the lines are
>>     repeated on the log..
>>
>>     Regards,
>>
>>     Renato Moutinho
>>
>>
>>
>>     Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com
>>     <ma...@hotmail.com>> escreveu:
>>
>>>     Don't be confused by 6.03 MB/s.
>>>
>>>     The relationship between mapper and reducer is M to N
>>>     relationship, which means the mapper could send its data to all
>>>     reducers, and one reducer could receive its input from all mappers.
>>>
>>>     There could be a lot of reasons why you think the reduce copying
>>>     phase is too slow. It could be the mappers are still running,
>>>     there is no data generated for reducer to copy yet; or there is
>>>     no enough threads in either mapper or reducer to utilize
>>>     remaining cpu/memory/network bandwidth. You can google the
>>>     hadoop configurations to adjust them.
>>>
>>>     But just because you can get 60M/s in scp, then complain only
>>>     getting 6M/s in the log is not fair to hadoop. You one reducer
>>>     needs to copy data from all the mappers, concurrently, makes it
>>>     impossible to reach the same speed as one to one point network
>>>     transfer speed.
>>>
>>>     The reducer stage is normally longer than map stage, as data HAS
>>>     to be transferred through network.
>>>
>>>     But in word count example, the data needs to be transferred
>>>     should be very small. You can ask the following question by
>>>     yourself:
>>>
>>>     1) Should I use combiner in this case? (Yes, for word count, it
>>>     reduces the data needs to be transferred).
>>>     2) Do I use all the reducers I can use, if my cluster is under
>>>     utilized and I want my job to finish fast?
>>>     3) Can I add more threads in the task tracker to help? You need
>>>     to dig into your log to find out if your mapper or reducer are
>>>     waiting for the thread from thread pool.
>>>
>>>     Yong
>>>
>>>     ------------------------------------------------------------------------
>>>     Date: Fri, 3 Oct 2014 18:40:16 -0300
>>>     Subject: Reduce phase of wordcount
>>>     From: renato.moutinho@gmail.com <ma...@gmail.com>
>>>     To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>>
>>>     Hi people,
>>>
>>>         I´m doing some experiments with hadoop 1.2.1 running the
>>>     wordcount sample on an 8 nodes cluster (master + 7 slaves).
>>>     Tuning the tasks configuration I´ve been able to make the map
>>>     phase run on 22 minutes.. However the reduce phase (which
>>>     consists of a single job) stucks at some points making the whole
>>>     job take more than 40 minutes. Looking at the logs, I´ve seen
>>>     several lines stuck at copy on different moments, like this:
>>>
>>>     2014-10-03 18:26:34,717 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:37,736 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:40,754 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:43,772 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>
>>>     Eventually the job end, but this information, being repeated,
>>>     makes me think it´s having difficulty transferring the parts
>>>     from the map nodes. Is my interpretation correct on this ? The
>>>     trasnfer rate is waaay too slow if compared to scp file transfer
>>>     between the hosts (10 times slower). Any takes on why ?
>>>
>>>     Regards,
>>>
>>>     Renato Moutinho
>
>

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Nice !

mapred.reduce.tasks affects the job (the group of tasks) so it should be 
at least equal to mapred.tasktracker.reduce.tasks.maximum * <number of 
nodes>
With your setup you allow each of your 7 tasktrackers to launch 8 
reducers (that would be 56) but you limit the total number of reducers 
at 7...

Combiners are very effective to limit shuffle overhead and for a job as 
wordcount you can just use the reduce class.
Just add something like job.setCombinerClass(MyReducer.class);
to your driver and you're good

Ulul

Le 06/10/2014 21:18, Renato Moutinho a écrit :
> Hi folks,
>
>     just as a feeback: increasing 
> mapred.tasktracker.reduce.tasks.maximum had no effect (it was already 
> set to 8) and the job created only 1 reducer (my original scenario). 
> However, adding mapred.reduce.tasks and setting to some higher than 1 
> value (I´ve set to 7) made hadoop spawn that much reduce tasks (seven 
> on my example) and the execution time went down to around 29 minutes 
> (also, my servers are now frying cpu....) ! My next step (I´m pushing 
> it to the maximum) is adding a combiner..
>
> And no... I haven´t setup this cluster just for running 
> wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)
>
> Thanks a lot for your help !
>
> Regards,
>
> Renato Moutinho
>
> 2014-10-05 18:53 GMT-03:00 Ulul <hadoop@ulul.org 
> <ma...@ulul.org>>:
>
>     Hi
>
>     You indicate that you have just one reducer, which is the default
>     in Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
>     You should increase mapred.reduce.tasks use combiners and maybe
>     tune mapred.reduce.tasktracker.reduce.tasks.maximum
>
>     Hope that helps
>     Ulul
>
>     Le 05/10/2014 16:53, Renato Moutinho a écrit :
>>     Hi there,
>>
>>          thanks a lot for taking the time to answer me ! Actually,
>>     this "issue" happens after all the map tasks have completed (I'm
>>     looking at the web interface). I'll try to diagnose if it's an
>>     issue with the number of threads.. I suppose I'll have to change
>>     the logging configuration to find what's going on..
>>
>>     The only that's getting to me is the fact that the lines are
>>     repeated on the log..
>>
>>     Regards,
>>
>>     Renato Moutinho
>>
>>
>>
>>     Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com
>>     <ma...@hotmail.com>> escreveu:
>>
>>>     Don't be confused by 6.03 MB/s.
>>>
>>>     The relationship between mapper and reducer is M to N
>>>     relationship, which means the mapper could send its data to all
>>>     reducers, and one reducer could receive its input from all mappers.
>>>
>>>     There could be a lot of reasons why you think the reduce copying
>>>     phase is too slow. It could be the mappers are still running,
>>>     there is no data generated for reducer to copy yet; or there is
>>>     no enough threads in either mapper or reducer to utilize
>>>     remaining cpu/memory/network bandwidth. You can google the
>>>     hadoop configurations to adjust them.
>>>
>>>     But just because you can get 60M/s in scp, then complain only
>>>     getting 6M/s in the log is not fair to hadoop. You one reducer
>>>     needs to copy data from all the mappers, concurrently, makes it
>>>     impossible to reach the same speed as one to one point network
>>>     transfer speed.
>>>
>>>     The reducer stage is normally longer than map stage, as data HAS
>>>     to be transferred through network.
>>>
>>>     But in word count example, the data needs to be transferred
>>>     should be very small. You can ask the following question by
>>>     yourself:
>>>
>>>     1) Should I use combiner in this case? (Yes, for word count, it
>>>     reduces the data needs to be transferred).
>>>     2) Do I use all the reducers I can use, if my cluster is under
>>>     utilized and I want my job to finish fast?
>>>     3) Can I add more threads in the task tracker to help? You need
>>>     to dig into your log to find out if your mapper or reducer are
>>>     waiting for the thread from thread pool.
>>>
>>>     Yong
>>>
>>>     ------------------------------------------------------------------------
>>>     Date: Fri, 3 Oct 2014 18:40:16 -0300
>>>     Subject: Reduce phase of wordcount
>>>     From: renato.moutinho@gmail.com <ma...@gmail.com>
>>>     To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>>
>>>     Hi people,
>>>
>>>         I´m doing some experiments with hadoop 1.2.1 running the
>>>     wordcount sample on an 8 nodes cluster (master + 7 slaves).
>>>     Tuning the tasks configuration I´ve been able to make the map
>>>     phase run on 22 minutes.. However the reduce phase (which
>>>     consists of a single job) stucks at some points making the whole
>>>     job take more than 40 minutes. Looking at the logs, I´ve seen
>>>     several lines stuck at copy on different moments, like this:
>>>
>>>     2014-10-03 18:26:34,717 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:37,736 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:40,754 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:43,772 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>
>>>     Eventually the job end, but this information, being repeated,
>>>     makes me think it´s having difficulty transferring the parts
>>>     from the map nodes. Is my interpretation correct on this ? The
>>>     trasnfer rate is waaay too slow if compared to scp file transfer
>>>     between the hosts (10 times slower). Any takes on why ?
>>>
>>>     Regards,
>>>
>>>     Renato Moutinho
>
>

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Nice !

mapred.reduce.tasks affects the job (the group of tasks) so it should be 
at least equal to mapred.tasktracker.reduce.tasks.maximum * <number of 
nodes>
With your setup you allow each of your 7 tasktrackers to launch 8 
reducers (that would be 56) but you limit the total number of reducers 
at 7...

Combiners are very effective to limit shuffle overhead and for a job as 
wordcount you can just use the reduce class.
Just add something like job.setCombinerClass(MyReducer.class);
to your driver and you're good

Ulul

Le 06/10/2014 21:18, Renato Moutinho a écrit :
> Hi folks,
>
>     just as a feeback: increasing 
> mapred.tasktracker.reduce.tasks.maximum had no effect (it was already 
> set to 8) and the job created only 1 reducer (my original scenario). 
> However, adding mapred.reduce.tasks and setting to some higher than 1 
> value (I´ve set to 7) made hadoop spawn that much reduce tasks (seven 
> on my example) and the execution time went down to around 29 minutes 
> (also, my servers are now frying cpu....) ! My next step (I´m pushing 
> it to the maximum) is adding a combiner..
>
> And no... I haven´t setup this cluster just for running 
> wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)
>
> Thanks a lot for your help !
>
> Regards,
>
> Renato Moutinho
>
> 2014-10-05 18:53 GMT-03:00 Ulul <hadoop@ulul.org 
> <ma...@ulul.org>>:
>
>     Hi
>
>     You indicate that you have just one reducer, which is the default
>     in Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
>     You should increase mapred.reduce.tasks use combiners and maybe
>     tune mapred.reduce.tasktracker.reduce.tasks.maximum
>
>     Hope that helps
>     Ulul
>
>     Le 05/10/2014 16:53, Renato Moutinho a écrit :
>>     Hi there,
>>
>>          thanks a lot for taking the time to answer me ! Actually,
>>     this "issue" happens after all the map tasks have completed (I'm
>>     looking at the web interface). I'll try to diagnose if it's an
>>     issue with the number of threads.. I suppose I'll have to change
>>     the logging configuration to find what's going on..
>>
>>     The only that's getting to me is the fact that the lines are
>>     repeated on the log..
>>
>>     Regards,
>>
>>     Renato Moutinho
>>
>>
>>
>>     Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com
>>     <ma...@hotmail.com>> escreveu:
>>
>>>     Don't be confused by 6.03 MB/s.
>>>
>>>     The relationship between mapper and reducer is M to N
>>>     relationship, which means the mapper could send its data to all
>>>     reducers, and one reducer could receive its input from all mappers.
>>>
>>>     There could be a lot of reasons why you think the reduce copying
>>>     phase is too slow. It could be the mappers are still running,
>>>     there is no data generated for reducer to copy yet; or there is
>>>     no enough threads in either mapper or reducer to utilize
>>>     remaining cpu/memory/network bandwidth. You can google the
>>>     hadoop configurations to adjust them.
>>>
>>>     But just because you can get 60M/s in scp, then complain only
>>>     getting 6M/s in the log is not fair to hadoop. You one reducer
>>>     needs to copy data from all the mappers, concurrently, makes it
>>>     impossible to reach the same speed as one to one point network
>>>     transfer speed.
>>>
>>>     The reducer stage is normally longer than map stage, as data HAS
>>>     to be transferred through network.
>>>
>>>     But in word count example, the data needs to be transferred
>>>     should be very small. You can ask the following question by
>>>     yourself:
>>>
>>>     1) Should I use combiner in this case? (Yes, for word count, it
>>>     reduces the data needs to be transferred).
>>>     2) Do I use all the reducers I can use, if my cluster is under
>>>     utilized and I want my job to finish fast?
>>>     3) Can I add more threads in the task tracker to help? You need
>>>     to dig into your log to find out if your mapper or reducer are
>>>     waiting for the thread from thread pool.
>>>
>>>     Yong
>>>
>>>     ------------------------------------------------------------------------
>>>     Date: Fri, 3 Oct 2014 18:40:16 -0300
>>>     Subject: Reduce phase of wordcount
>>>     From: renato.moutinho@gmail.com <ma...@gmail.com>
>>>     To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>>
>>>     Hi people,
>>>
>>>         I´m doing some experiments with hadoop 1.2.1 running the
>>>     wordcount sample on an 8 nodes cluster (master + 7 slaves).
>>>     Tuning the tasks configuration I´ve been able to make the map
>>>     phase run on 22 minutes.. However the reduce phase (which
>>>     consists of a single job) stucks at some points making the whole
>>>     job take more than 40 minutes. Looking at the logs, I´ve seen
>>>     several lines stuck at copy on different moments, like this:
>>>
>>>     2014-10-03 18:26:34,717 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:37,736 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:40,754 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>     2014-10-03 18:26:43,772 INFO
>>>     org.apache.hadoop.mapred.TaskTracker:
>>>     attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy
>>>     (971 of 980 at 6.03 MB/s) >
>>>
>>>     Eventually the job end, but this information, being repeated,
>>>     makes me think it´s having difficulty transferring the parts
>>>     from the map nodes. Is my interpretation correct on this ? The
>>>     trasnfer rate is waaay too slow if compared to scp file transfer
>>>     between the hosts (10 times slower). Any takes on why ?
>>>
>>>     Regards,
>>>
>>>     Renato Moutinho
>
>

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi folks,

    just as a feeback: increasing mapred.tasktracker.reduce.tasks.maximum
had no effect (it was already set to 8) and the job created only 1 reducer
(my original scenario). However, adding mapred.reduce.tasks and setting to
some higher than 1 value (I´ve set to 7) made hadoop spawn that much reduce
tasks (seven on my example) and the execution time went down to around 29
minutes (also, my servers are now frying cpu....) ! My next step (I´m
pushing it to the maximum) is adding a combiner..

And no... I haven´t setup this cluster just for running
wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)

Thanks a lot for your help !

Regards,

Renato Moutinho

2014-10-05 18:53 GMT-03:00 Ulul <ha...@ulul.org>:

>  Hi
>
> You indicate that you have just one reducer, which is the default in
> Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
> You should increase mapred.reduce.tasks use combiners and maybe tune
> mapred.reduce.tasktracker.reduce.tasks.maximum
>
> Hope that helps
> Ulul
>
>  Le 05/10/2014 16:53, Renato Moutinho a écrit :
>
> Hi there,
>
>       thanks a lot for taking the time to answer me ! Actually, this
> "issue" happens after all the map tasks have completed (I'm looking at the
> web interface). I'll try to diagnose if it's an issue with the number of
> threads.. I suppose I'll have to change the logging configuration to find
> what's going on..
>
>  The only that's getting to me is the fact that the lines are repeated on
> the log..
>
>  Regards,
>
>  Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:
>
>   Don't be confused by 6.03 MB/s.
>
>  The relationship between mapper and reducer is M to N relationship,
> which means the mapper could send its data to all reducers, and one reducer
> could receive its input from all mappers.
>
>  There could be a lot of reasons why you think the reduce copying phase
> is too slow. It could be the mappers are still running, there is no data
> generated for reducer to copy yet; or there is no enough threads in either
> mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
> can google the hadoop configurations to adjust them.
>
>  But just because you can get 60M/s in scp, then complain only getting
> 6M/s in the log is not fair to hadoop. You one reducer needs to copy data
> from all the mappers, concurrently, makes it impossible to reach the same
> speed as one to one point network transfer speed.
>
>  The reducer stage is normally longer than map stage, as data HAS to be
> transferred through network.
>
>  But in word count example, the data needs to be transferred should be
> very small. You can ask the following question by yourself:
>
>  1) Should I use combiner in this case? (Yes, for word count, it reduces
> the data needs to be transferred).
> 2) Do I use all the reducers I can use, if my cluster is under utilized
> and I want my job to finish fast?
> 3) Can I add more threads in the task tracker to help? You need to dig
> into your log to find out if your mapper or reducer are waiting for the
> thread from thread pool.
>
>  Yong
>
>  ------------------------------
> Date: Fri, 3 Oct 2014 18:40:16 -0300
> Subject: Reduce phase of wordcount
> From: renato.moutinho@gmail.com
> To: user@hadoop.apache.org
>
>   Hi people,
>
>      I´m doing some experiments with hadoop 1.2.1 running the wordcount
> sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
> configuration I´ve been able to make the map phase run on 22 minutes..
> However the reduce phase (which consists of a single job) stucks at some
> points making the whole job take more than 40 minutes. Looking at the logs,
> I´ve seen several lines stuck at copy on different moments, like this:
>
> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
>
>  Eventually the job end, but this information, being repeated, makes me
> think it´s having difficulty transferring the parts from the map nodes. Is
> my interpretation correct on this ? The trasnfer rate is waaay too slow if
> compared to scp file transfer between the hosts (10 times slower). Any
> takes on why ?
>
> Regards,
>
> Renato Moutinho
>
>
>

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi folks,

    just as a feeback: increasing mapred.tasktracker.reduce.tasks.maximum
had no effect (it was already set to 8) and the job created only 1 reducer
(my original scenario). However, adding mapred.reduce.tasks and setting to
some higher than 1 value (I´ve set to 7) made hadoop spawn that much reduce
tasks (seven on my example) and the execution time went down to around 29
minutes (also, my servers are now frying cpu....) ! My next step (I´m
pushing it to the maximum) is adding a combiner..

And no... I haven´t setup this cluster just for running
wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)

Thanks a lot for your help !

Regards,

Renato Moutinho

2014-10-05 18:53 GMT-03:00 Ulul <ha...@ulul.org>:

>  Hi
>
> You indicate that you have just one reducer, which is the default in
> Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
> You should increase mapred.reduce.tasks use combiners and maybe tune
> mapred.reduce.tasktracker.reduce.tasks.maximum
>
> Hope that helps
> Ulul
>
>  Le 05/10/2014 16:53, Renato Moutinho a écrit :
>
> Hi there,
>
>       thanks a lot for taking the time to answer me ! Actually, this
> "issue" happens after all the map tasks have completed (I'm looking at the
> web interface). I'll try to diagnose if it's an issue with the number of
> threads.. I suppose I'll have to change the logging configuration to find
> what's going on..
>
>  The only that's getting to me is the fact that the lines are repeated on
> the log..
>
>  Regards,
>
>  Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:
>
>   Don't be confused by 6.03 MB/s.
>
>  The relationship between mapper and reducer is M to N relationship,
> which means the mapper could send its data to all reducers, and one reducer
> could receive its input from all mappers.
>
>  There could be a lot of reasons why you think the reduce copying phase
> is too slow. It could be the mappers are still running, there is no data
> generated for reducer to copy yet; or there is no enough threads in either
> mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
> can google the hadoop configurations to adjust them.
>
>  But just because you can get 60M/s in scp, then complain only getting
> 6M/s in the log is not fair to hadoop. You one reducer needs to copy data
> from all the mappers, concurrently, makes it impossible to reach the same
> speed as one to one point network transfer speed.
>
>  The reducer stage is normally longer than map stage, as data HAS to be
> transferred through network.
>
>  But in word count example, the data needs to be transferred should be
> very small. You can ask the following question by yourself:
>
>  1) Should I use combiner in this case? (Yes, for word count, it reduces
> the data needs to be transferred).
> 2) Do I use all the reducers I can use, if my cluster is under utilized
> and I want my job to finish fast?
> 3) Can I add more threads in the task tracker to help? You need to dig
> into your log to find out if your mapper or reducer are waiting for the
> thread from thread pool.
>
>  Yong
>
>  ------------------------------
> Date: Fri, 3 Oct 2014 18:40:16 -0300
> Subject: Reduce phase of wordcount
> From: renato.moutinho@gmail.com
> To: user@hadoop.apache.org
>
>   Hi people,
>
>      I´m doing some experiments with hadoop 1.2.1 running the wordcount
> sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
> configuration I´ve been able to make the map phase run on 22 minutes..
> However the reduce phase (which consists of a single job) stucks at some
> points making the whole job take more than 40 minutes. Looking at the logs,
> I´ve seen several lines stuck at copy on different moments, like this:
>
> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
>
>  Eventually the job end, but this information, being repeated, makes me
> think it´s having difficulty transferring the parts from the map nodes. Is
> my interpretation correct on this ? The trasnfer rate is waaay too slow if
> compared to scp file transfer between the hosts (10 times slower). Any
> takes on why ?
>
> Regards,
>
> Renato Moutinho
>
>
>

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi folks,

    just as a feeback: increasing mapred.tasktracker.reduce.tasks.maximum
had no effect (it was already set to 8) and the job created only 1 reducer
(my original scenario). However, adding mapred.reduce.tasks and setting to
some higher than 1 value (I´ve set to 7) made hadoop spawn that much reduce
tasks (seven on my example) and the execution time went down to around 29
minutes (also, my servers are now frying cpu....) ! My next step (I´m
pushing it to the maximum) is adding a combiner..

And no... I haven´t setup this cluster just for running
wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)

Thanks a lot for your help !

Regards,

Renato Moutinho

2014-10-05 18:53 GMT-03:00 Ulul <ha...@ulul.org>:

>  Hi
>
> You indicate that you have just one reducer, which is the default in
> Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
> You should increase mapred.reduce.tasks use combiners and maybe tune
> mapred.reduce.tasktracker.reduce.tasks.maximum
>
> Hope that helps
> Ulul
>
>  Le 05/10/2014 16:53, Renato Moutinho a écrit :
>
> Hi there,
>
>       thanks a lot for taking the time to answer me ! Actually, this
> "issue" happens after all the map tasks have completed (I'm looking at the
> web interface). I'll try to diagnose if it's an issue with the number of
> threads.. I suppose I'll have to change the logging configuration to find
> what's going on..
>
>  The only that's getting to me is the fact that the lines are repeated on
> the log..
>
>  Regards,
>
>  Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:
>
>   Don't be confused by 6.03 MB/s.
>
>  The relationship between mapper and reducer is M to N relationship,
> which means the mapper could send its data to all reducers, and one reducer
> could receive its input from all mappers.
>
>  There could be a lot of reasons why you think the reduce copying phase
> is too slow. It could be the mappers are still running, there is no data
> generated for reducer to copy yet; or there is no enough threads in either
> mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
> can google the hadoop configurations to adjust them.
>
>  But just because you can get 60M/s in scp, then complain only getting
> 6M/s in the log is not fair to hadoop. You one reducer needs to copy data
> from all the mappers, concurrently, makes it impossible to reach the same
> speed as one to one point network transfer speed.
>
>  The reducer stage is normally longer than map stage, as data HAS to be
> transferred through network.
>
>  But in word count example, the data needs to be transferred should be
> very small. You can ask the following question by yourself:
>
>  1) Should I use combiner in this case? (Yes, for word count, it reduces
> the data needs to be transferred).
> 2) Do I use all the reducers I can use, if my cluster is under utilized
> and I want my job to finish fast?
> 3) Can I add more threads in the task tracker to help? You need to dig
> into your log to find out if your mapper or reducer are waiting for the
> thread from thread pool.
>
>  Yong
>
>  ------------------------------
> Date: Fri, 3 Oct 2014 18:40:16 -0300
> Subject: Reduce phase of wordcount
> From: renato.moutinho@gmail.com
> To: user@hadoop.apache.org
>
>   Hi people,
>
>      I´m doing some experiments with hadoop 1.2.1 running the wordcount
> sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
> configuration I´ve been able to make the map phase run on 22 minutes..
> However the reduce phase (which consists of a single job) stucks at some
> points making the whole job take more than 40 minutes. Looking at the logs,
> I´ve seen several lines stuck at copy on different moments, like this:
>
> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
>
>  Eventually the job end, but this information, being repeated, makes me
> think it´s having difficulty transferring the parts from the map nodes. Is
> my interpretation correct on this ? The trasnfer rate is waaay too slow if
> compared to scp file transfer between the hosts (10 times slower). Any
> takes on why ?
>
> Regards,
>
> Renato Moutinho
>
>
>

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi folks,

    just as a feeback: increasing mapred.tasktracker.reduce.tasks.maximum
had no effect (it was already set to 8) and the job created only 1 reducer
(my original scenario). However, adding mapred.reduce.tasks and setting to
some higher than 1 value (I´ve set to 7) made hadoop spawn that much reduce
tasks (seven on my example) and the execution time went down to around 29
minutes (also, my servers are now frying cpu....) ! My next step (I´m
pushing it to the maximum) is adding a combiner..

And no... I haven´t setup this cluster just for running
wordcount.Kkkkkkkk.... I'm still getting to know hadoop. :-)

Thanks a lot for your help !

Regards,

Renato Moutinho

2014-10-05 18:53 GMT-03:00 Ulul <ha...@ulul.org>:

>  Hi
>
> You indicate that you have just one reducer, which is the default in
> Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
> You should increase mapred.reduce.tasks use combiners and maybe tune
> mapred.reduce.tasktracker.reduce.tasks.maximum
>
> Hope that helps
> Ulul
>
>  Le 05/10/2014 16:53, Renato Moutinho a écrit :
>
> Hi there,
>
>       thanks a lot for taking the time to answer me ! Actually, this
> "issue" happens after all the map tasks have completed (I'm looking at the
> web interface). I'll try to diagnose if it's an issue with the number of
> threads.. I suppose I'll have to change the logging configuration to find
> what's going on..
>
>  The only that's getting to me is the fact that the lines are repeated on
> the log..
>
>  Regards,
>
>  Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:
>
>   Don't be confused by 6.03 MB/s.
>
>  The relationship between mapper and reducer is M to N relationship,
> which means the mapper could send its data to all reducers, and one reducer
> could receive its input from all mappers.
>
>  There could be a lot of reasons why you think the reduce copying phase
> is too slow. It could be the mappers are still running, there is no data
> generated for reducer to copy yet; or there is no enough threads in either
> mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
> can google the hadoop configurations to adjust them.
>
>  But just because you can get 60M/s in scp, then complain only getting
> 6M/s in the log is not fair to hadoop. You one reducer needs to copy data
> from all the mappers, concurrently, makes it impossible to reach the same
> speed as one to one point network transfer speed.
>
>  The reducer stage is normally longer than map stage, as data HAS to be
> transferred through network.
>
>  But in word count example, the data needs to be transferred should be
> very small. You can ask the following question by yourself:
>
>  1) Should I use combiner in this case? (Yes, for word count, it reduces
> the data needs to be transferred).
> 2) Do I use all the reducers I can use, if my cluster is under utilized
> and I want my job to finish fast?
> 3) Can I add more threads in the task tracker to help? You need to dig
> into your log to find out if your mapper or reducer are waiting for the
> thread from thread pool.
>
>  Yong
>
>  ------------------------------
> Date: Fri, 3 Oct 2014 18:40:16 -0300
> Subject: Reduce phase of wordcount
> From: renato.moutinho@gmail.com
> To: user@hadoop.apache.org
>
>   Hi people,
>
>      I´m doing some experiments with hadoop 1.2.1 running the wordcount
> sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
> configuration I´ve been able to make the map phase run on 22 minutes..
> However the reduce phase (which consists of a single job) stucks at some
> points making the whole job take more than 40 minutes. Looking at the logs,
> I´ve seen several lines stuck at copy on different moments, like this:
>
> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
> at 6.03 MB/s) >
>
>  Eventually the job end, but this information, being repeated, makes me
> think it´s having difficulty transferring the parts from the map nodes. Is
> my interpretation correct on this ? The trasnfer rate is waaay too slow if
> compared to scp file transfer between the hosts (10 times slower). Any
> takes on why ?
>
> Regards,
>
> Renato Moutinho
>
>
>

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Hi

You indicate that you have just one reducer, which is the default in 
Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
You should increase mapred.reduce.tasks use combiners and maybe tune 
mapred.reduce.tasktracker.reduce.tasks.maximum

Hope that helps
Ulul

Le 05/10/2014 16:53, Renato Moutinho a écrit :
> Hi there,
>
>      thanks a lot for taking the time to answer me ! Actually, this 
> "issue" happens after all the map tasks have completed (I'm looking at 
> the web interface). I'll try to diagnose if it's an issue with the 
> number of threads.. I suppose I'll have to change the logging 
> configuration to find what's going on..
>
> The only that's getting to me is the fact that the lines are repeated 
> on the log..
>
> Regards,
>
> Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com 
> <ma...@hotmail.com>> escreveu:
>
>> Don't be confused by 6.03 MB/s.
>>
>> The relationship between mapper and reducer is M to N relationship, 
>> which means the mapper could send its data to all reducers, and one 
>> reducer could receive its input from all mappers.
>>
>> There could be a lot of reasons why you think the reduce copying 
>> phase is too slow. It could be the mappers are still running, there 
>> is no data generated for reducer to copy yet; or there is no enough 
>> threads in either mapper or reducer to utilize remaining 
>> cpu/memory/network bandwidth. You can google the hadoop 
>> configurations to adjust them.
>>
>> But just because you can get 60M/s in scp, then complain only getting 
>> 6M/s in the log is not fair to hadoop. You one reducer needs to copy 
>> data from all the mappers, concurrently, makes it impossible to reach 
>> the same speed as one to one point network transfer speed.
>>
>> The reducer stage is normally longer than map stage, as data HAS to 
>> be transferred through network.
>>
>> But in word count example, the data needs to be transferred should be 
>> very small. You can ask the following question by yourself:
>>
>> 1) Should I use combiner in this case? (Yes, for word count, it 
>> reduces the data needs to be transferred).
>> 2) Do I use all the reducers I can use, if my cluster is under 
>> utilized and I want my job to finish fast?
>> 3) Can I add more threads in the task tracker to help? You need to 
>> dig into your log to find out if your mapper or reducer are waiting 
>> for the thread from thread pool.
>>
>> Yong
>>
>> ------------------------------------------------------------------------
>> Date: Fri, 3 Oct 2014 18:40:16 -0300
>> Subject: Reduce phase of wordcount
>> From: renato.moutinho@gmail.com <ma...@gmail.com>
>> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>
>> Hi people,
>>
>>     I´m doing some experiments with hadoop 1.2.1 running the 
>> wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning 
>> the tasks configuration I´ve been able to make the map phase run on 
>> 22 minutes.. However the reduce phase (which consists of a single 
>> job) stucks at some points making the whole job take more than 40 
>> minutes. Looking at the logs, I´ve seen several lines stuck at copy 
>> on different moments, like this:
>>
>> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>>
>> Eventually the job end, but this information, being repeated, makes 
>> me think it´s having difficulty transferring the parts from the map 
>> nodes. Is my interpretation correct on this ? The trasnfer rate is 
>> waaay too slow if compared to scp file transfer between the hosts (10 
>> times slower). Any takes on why ?
>>
>> Regards,
>>
>> Renato Moutinho

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Hi

You indicate that you have just one reducer, which is the default in 
Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
You should increase mapred.reduce.tasks use combiners and maybe tune 
mapred.reduce.tasktracker.reduce.tasks.maximum

Hope that helps
Ulul

Le 05/10/2014 16:53, Renato Moutinho a écrit :
> Hi there,
>
>      thanks a lot for taking the time to answer me ! Actually, this 
> "issue" happens after all the map tasks have completed (I'm looking at 
> the web interface). I'll try to diagnose if it's an issue with the 
> number of threads.. I suppose I'll have to change the logging 
> configuration to find what's going on..
>
> The only that's getting to me is the fact that the lines are repeated 
> on the log..
>
> Regards,
>
> Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com 
> <ma...@hotmail.com>> escreveu:
>
>> Don't be confused by 6.03 MB/s.
>>
>> The relationship between mapper and reducer is M to N relationship, 
>> which means the mapper could send its data to all reducers, and one 
>> reducer could receive its input from all mappers.
>>
>> There could be a lot of reasons why you think the reduce copying 
>> phase is too slow. It could be the mappers are still running, there 
>> is no data generated for reducer to copy yet; or there is no enough 
>> threads in either mapper or reducer to utilize remaining 
>> cpu/memory/network bandwidth. You can google the hadoop 
>> configurations to adjust them.
>>
>> But just because you can get 60M/s in scp, then complain only getting 
>> 6M/s in the log is not fair to hadoop. You one reducer needs to copy 
>> data from all the mappers, concurrently, makes it impossible to reach 
>> the same speed as one to one point network transfer speed.
>>
>> The reducer stage is normally longer than map stage, as data HAS to 
>> be transferred through network.
>>
>> But in word count example, the data needs to be transferred should be 
>> very small. You can ask the following question by yourself:
>>
>> 1) Should I use combiner in this case? (Yes, for word count, it 
>> reduces the data needs to be transferred).
>> 2) Do I use all the reducers I can use, if my cluster is under 
>> utilized and I want my job to finish fast?
>> 3) Can I add more threads in the task tracker to help? You need to 
>> dig into your log to find out if your mapper or reducer are waiting 
>> for the thread from thread pool.
>>
>> Yong
>>
>> ------------------------------------------------------------------------
>> Date: Fri, 3 Oct 2014 18:40:16 -0300
>> Subject: Reduce phase of wordcount
>> From: renato.moutinho@gmail.com <ma...@gmail.com>
>> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>
>> Hi people,
>>
>>     I´m doing some experiments with hadoop 1.2.1 running the 
>> wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning 
>> the tasks configuration I´ve been able to make the map phase run on 
>> 22 minutes.. However the reduce phase (which consists of a single 
>> job) stucks at some points making the whole job take more than 40 
>> minutes. Looking at the logs, I´ve seen several lines stuck at copy 
>> on different moments, like this:
>>
>> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>>
>> Eventually the job end, but this information, being repeated, makes 
>> me think it´s having difficulty transferring the parts from the map 
>> nodes. Is my interpretation correct on this ? The trasnfer rate is 
>> waaay too slow if compared to scp file transfer between the hosts (10 
>> times slower). Any takes on why ?
>>
>> Regards,
>>
>> Renato Moutinho

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Hi

You indicate that you have just one reducer, which is the default in 
Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
You should increase mapred.reduce.tasks use combiners and maybe tune 
mapred.reduce.tasktracker.reduce.tasks.maximum

Hope that helps
Ulul

Le 05/10/2014 16:53, Renato Moutinho a écrit :
> Hi there,
>
>      thanks a lot for taking the time to answer me ! Actually, this 
> "issue" happens after all the map tasks have completed (I'm looking at 
> the web interface). I'll try to diagnose if it's an issue with the 
> number of threads.. I suppose I'll have to change the logging 
> configuration to find what's going on..
>
> The only that's getting to me is the fact that the lines are repeated 
> on the log..
>
> Regards,
>
> Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com 
> <ma...@hotmail.com>> escreveu:
>
>> Don't be confused by 6.03 MB/s.
>>
>> The relationship between mapper and reducer is M to N relationship, 
>> which means the mapper could send its data to all reducers, and one 
>> reducer could receive its input from all mappers.
>>
>> There could be a lot of reasons why you think the reduce copying 
>> phase is too slow. It could be the mappers are still running, there 
>> is no data generated for reducer to copy yet; or there is no enough 
>> threads in either mapper or reducer to utilize remaining 
>> cpu/memory/network bandwidth. You can google the hadoop 
>> configurations to adjust them.
>>
>> But just because you can get 60M/s in scp, then complain only getting 
>> 6M/s in the log is not fair to hadoop. You one reducer needs to copy 
>> data from all the mappers, concurrently, makes it impossible to reach 
>> the same speed as one to one point network transfer speed.
>>
>> The reducer stage is normally longer than map stage, as data HAS to 
>> be transferred through network.
>>
>> But in word count example, the data needs to be transferred should be 
>> very small. You can ask the following question by yourself:
>>
>> 1) Should I use combiner in this case? (Yes, for word count, it 
>> reduces the data needs to be transferred).
>> 2) Do I use all the reducers I can use, if my cluster is under 
>> utilized and I want my job to finish fast?
>> 3) Can I add more threads in the task tracker to help? You need to 
>> dig into your log to find out if your mapper or reducer are waiting 
>> for the thread from thread pool.
>>
>> Yong
>>
>> ------------------------------------------------------------------------
>> Date: Fri, 3 Oct 2014 18:40:16 -0300
>> Subject: Reduce phase of wordcount
>> From: renato.moutinho@gmail.com <ma...@gmail.com>
>> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>
>> Hi people,
>>
>>     I´m doing some experiments with hadoop 1.2.1 running the 
>> wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning 
>> the tasks configuration I´ve been able to make the map phase run on 
>> 22 minutes.. However the reduce phase (which consists of a single 
>> job) stucks at some points making the whole job take more than 40 
>> minutes. Looking at the logs, I´ve seen several lines stuck at copy 
>> on different moments, like this:
>>
>> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>>
>> Eventually the job end, but this information, being repeated, makes 
>> me think it´s having difficulty transferring the parts from the map 
>> nodes. Is my interpretation correct on this ? The trasnfer rate is 
>> waaay too slow if compared to scp file transfer between the hosts (10 
>> times slower). Any takes on why ?
>>
>> Regards,
>>
>> Renato Moutinho

Re: Reduce phase of wordcount

Posted by Ulul <ha...@ulul.org>.

Hi

You indicate that you have just one reducer, which is the default in 
Hadoop 1 but quite insufficient for a 7 slave nodes cluster.
You should increase mapred.reduce.tasks use combiners and maybe tune 
mapred.reduce.tasktracker.reduce.tasks.maximum

Hope that helps
Ulul

Le 05/10/2014 16:53, Renato Moutinho a écrit :
> Hi there,
>
>      thanks a lot for taking the time to answer me ! Actually, this 
> "issue" happens after all the map tasks have completed (I'm looking at 
> the web interface). I'll try to diagnose if it's an issue with the 
> number of threads.. I suppose I'll have to change the logging 
> configuration to find what's going on..
>
> The only that's getting to me is the fact that the lines are repeated 
> on the log..
>
> Regards,
>
> Renato Moutinho
>
>
>
> Em 05/10/2014, às 10:52, java8964 <java8964@hotmail.com 
> <ma...@hotmail.com>> escreveu:
>
>> Don't be confused by 6.03 MB/s.
>>
>> The relationship between mapper and reducer is M to N relationship, 
>> which means the mapper could send its data to all reducers, and one 
>> reducer could receive its input from all mappers.
>>
>> There could be a lot of reasons why you think the reduce copying 
>> phase is too slow. It could be the mappers are still running, there 
>> is no data generated for reducer to copy yet; or there is no enough 
>> threads in either mapper or reducer to utilize remaining 
>> cpu/memory/network bandwidth. You can google the hadoop 
>> configurations to adjust them.
>>
>> But just because you can get 60M/s in scp, then complain only getting 
>> 6M/s in the log is not fair to hadoop. You one reducer needs to copy 
>> data from all the mappers, concurrently, makes it impossible to reach 
>> the same speed as one to one point network transfer speed.
>>
>> The reducer stage is normally longer than map stage, as data HAS to 
>> be transferred through network.
>>
>> But in word count example, the data needs to be transferred should be 
>> very small. You can ask the following question by yourself:
>>
>> 1) Should I use combiner in this case? (Yes, for word count, it 
>> reduces the data needs to be transferred).
>> 2) Do I use all the reducers I can use, if my cluster is under 
>> utilized and I want my job to finish fast?
>> 3) Can I add more threads in the task tracker to help? You need to 
>> dig into your log to find out if your mapper or reducer are waiting 
>> for the thread from thread pool.
>>
>> Yong
>>
>> ------------------------------------------------------------------------
>> Date: Fri, 3 Oct 2014 18:40:16 -0300
>> Subject: Reduce phase of wordcount
>> From: renato.moutinho@gmail.com <ma...@gmail.com>
>> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
>>
>> Hi people,
>>
>>     I´m doing some experiments with hadoop 1.2.1 running the 
>> wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning 
>> the tasks configuration I´ve been able to make the map phase run on 
>> 22 minutes.. However the reduce phase (which consists of a single 
>> job) stucks at some points making the whole job take more than 40 
>> minutes. Looking at the logs, I´ve seen several lines stuck at copy 
>> on different moments, like this:
>>
>> 2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>> 2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: 
>> attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 
>> 980 at 6.03 MB/s) >
>>
>> Eventually the job end, but this information, being repeated, makes 
>> me think it´s having difficulty transferring the parts from the map 
>> nodes. Is my interpretation correct on this ? The trasnfer rate is 
>> waaay too slow if compared to scp file transfer between the hosts (10 
>> times slower). Any takes on why ?
>>
>> Regards,
>>
>> Renato Moutinho

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi there,

     thanks a lot for taking the time to answer me ! Actually, this "issue"
happens after all the map tasks have completed (I'm looking at the web
interface). I'll try to diagnose if it's an issue with the number of
threads.. I suppose I'll have to change the logging configuration to find
what's going on..

The only that's getting to me is the fact that the lines are repeated on
the log..

Regards,

Renato Moutinho



Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:

 Don't be confused by 6.03 MB/s.

The relationship between mapper and reducer is M to N relationship, which
means the mapper could send its data to all reducers, and one reducer could
receive its input from all mappers.

There could be a lot of reasons why you think the reduce copying phase is
too slow. It could be the mappers are still running, there is no data
generated for reducer to copy yet; or there is no enough threads in either
mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
can google the hadoop configurations to adjust them.

But just because you can get 60M/s in scp, then complain only getting 6M/s
in the log is not fair to hadoop. You one reducer needs to copy data from
all the mappers, concurrently, makes it impossible to reach the same speed
as one to one point network transfer speed.

The reducer stage is normally longer than map stage, as data HAS to be
transferred through network.

But in word count example, the data needs to be transferred should be very
small. You can ask the following question by yourself:

1) Should I use combiner in this case? (Yes, for word count, it reduces the
data needs to be transferred).
2) Do I use all the reducers I can use, if my cluster is under utilized and
I want my job to finish fast?
3) Can I add more threads in the task tracker to help? You need to dig into
your log to find out if your mapper or reducer are waiting for the thread
from thread pool.

Yong

------------------------------
Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount
sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
configuration I´ve been able to make the map phase run on 22 minutes..
However the reduce phase (which consists of a single job) stucks at some
points making the whole job take more than 40 minutes. Looking at the logs,
I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me
think it´s having difficulty transferring the parts from the map nodes. Is
my interpretation correct on this ? The trasnfer rate is waaay too slow if
compared to scp file transfer between the hosts (10 times slower). Any
takes on why ?

Regards,

Renato Moutinho

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi there,

     thanks a lot for taking the time to answer me ! Actually, this "issue"
happens after all the map tasks have completed (I'm looking at the web
interface). I'll try to diagnose if it's an issue with the number of
threads.. I suppose I'll have to change the logging configuration to find
what's going on..

The only that's getting to me is the fact that the lines are repeated on
the log..

Regards,

Renato Moutinho



Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:

 Don't be confused by 6.03 MB/s.

The relationship between mapper and reducer is M to N relationship, which
means the mapper could send its data to all reducers, and one reducer could
receive its input from all mappers.

There could be a lot of reasons why you think the reduce copying phase is
too slow. It could be the mappers are still running, there is no data
generated for reducer to copy yet; or there is no enough threads in either
mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
can google the hadoop configurations to adjust them.

But just because you can get 60M/s in scp, then complain only getting 6M/s
in the log is not fair to hadoop. You one reducer needs to copy data from
all the mappers, concurrently, makes it impossible to reach the same speed
as one to one point network transfer speed.

The reducer stage is normally longer than map stage, as data HAS to be
transferred through network.

But in word count example, the data needs to be transferred should be very
small. You can ask the following question by yourself:

1) Should I use combiner in this case? (Yes, for word count, it reduces the
data needs to be transferred).
2) Do I use all the reducers I can use, if my cluster is under utilized and
I want my job to finish fast?
3) Can I add more threads in the task tracker to help? You need to dig into
your log to find out if your mapper or reducer are waiting for the thread
from thread pool.

Yong

------------------------------
Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount
sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
configuration I´ve been able to make the map phase run on 22 minutes..
However the reduce phase (which consists of a single job) stucks at some
points making the whole job take more than 40 minutes. Looking at the logs,
I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me
think it´s having difficulty transferring the parts from the map nodes. Is
my interpretation correct on this ? The trasnfer rate is waaay too slow if
compared to scp file transfer between the hosts (10 times slower). Any
takes on why ?

Regards,

Renato Moutinho

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi there,

     thanks a lot for taking the time to answer me ! Actually, this "issue"
happens after all the map tasks have completed (I'm looking at the web
interface). I'll try to diagnose if it's an issue with the number of
threads.. I suppose I'll have to change the logging configuration to find
what's going on..

The only that's getting to me is the fact that the lines are repeated on
the log..

Regards,

Renato Moutinho



Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:

 Don't be confused by 6.03 MB/s.

The relationship between mapper and reducer is M to N relationship, which
means the mapper could send its data to all reducers, and one reducer could
receive its input from all mappers.

There could be a lot of reasons why you think the reduce copying phase is
too slow. It could be the mappers are still running, there is no data
generated for reducer to copy yet; or there is no enough threads in either
mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
can google the hadoop configurations to adjust them.

But just because you can get 60M/s in scp, then complain only getting 6M/s
in the log is not fair to hadoop. You one reducer needs to copy data from
all the mappers, concurrently, makes it impossible to reach the same speed
as one to one point network transfer speed.

The reducer stage is normally longer than map stage, as data HAS to be
transferred through network.

But in word count example, the data needs to be transferred should be very
small. You can ask the following question by yourself:

1) Should I use combiner in this case? (Yes, for word count, it reduces the
data needs to be transferred).
2) Do I use all the reducers I can use, if my cluster is under utilized and
I want my job to finish fast?
3) Can I add more threads in the task tracker to help? You need to dig into
your log to find out if your mapper or reducer are waiting for the thread
from thread pool.

Yong

------------------------------
Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount
sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
configuration I´ve been able to make the map phase run on 22 minutes..
However the reduce phase (which consists of a single job) stucks at some
points making the whole job take more than 40 minutes. Looking at the logs,
I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me
think it´s having difficulty transferring the parts from the map nodes. Is
my interpretation correct on this ? The trasnfer rate is waaay too slow if
compared to scp file transfer between the hosts (10 times slower). Any
takes on why ?

Regards,

Renato Moutinho

Re: Reduce phase of wordcount

Posted by Renato Moutinho <re...@gmail.com>.

Hi there,

     thanks a lot for taking the time to answer me ! Actually, this "issue"
happens after all the map tasks have completed (I'm looking at the web
interface). I'll try to diagnose if it's an issue with the number of
threads.. I suppose I'll have to change the logging configuration to find
what's going on..

The only that's getting to me is the fact that the lines are repeated on
the log..

Regards,

Renato Moutinho



Em 05/10/2014, às 10:52, java8964 <ja...@hotmail.com> escreveu:

 Don't be confused by 6.03 MB/s.

The relationship between mapper and reducer is M to N relationship, which
means the mapper could send its data to all reducers, and one reducer could
receive its input from all mappers.

There could be a lot of reasons why you think the reduce copying phase is
too slow. It could be the mappers are still running, there is no data
generated for reducer to copy yet; or there is no enough threads in either
mapper or reducer to utilize remaining cpu/memory/network bandwidth. You
can google the hadoop configurations to adjust them.

But just because you can get 60M/s in scp, then complain only getting 6M/s
in the log is not fair to hadoop. You one reducer needs to copy data from
all the mappers, concurrently, makes it impossible to reach the same speed
as one to one point network transfer speed.

The reducer stage is normally longer than map stage, as data HAS to be
transferred through network.

But in word count example, the data needs to be transferred should be very
small. You can ask the following question by yourself:

1) Should I use combiner in this case? (Yes, for word count, it reduces the
data needs to be transferred).
2) Do I use all the reducers I can use, if my cluster is under utilized and
I want my job to finish fast?
3) Can I add more threads in the task tracker to help? You need to dig into
your log to find out if your mapper or reducer are waiting for the thread
from thread pool.

Yong

------------------------------
Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount
sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks
configuration I´ve been able to make the map phase run on 22 minutes..
However the reduce phase (which consists of a single job) stucks at some
points making the whole job take more than 40 minutes. Looking at the logs,
I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980
at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me
think it´s having difficulty transferring the parts from the map nodes. Is
my interpretation correct on this ? The trasnfer rate is waaay too slow if
compared to scp file transfer between the hosts (10 times slower). Any
takes on why ?

Regards,

Renato Moutinho

RE: Reduce phase of wordcount

Posted by java8964 <ja...@hotmail.com>.

Don't be confused by 6.03 MB/s.
The relationship between mapper and reducer is M to N relationship, which means the mapper could send its data to all reducers, and one reducer could receive its input from all mappers.
There could be a lot of reasons why you think the reduce copying phase is too slow. It could be the mappers are still running, there is no data generated for reducer to copy yet; or there is no enough threads in either mapper or reducer to utilize remaining cpu/memory/network bandwidth. You can google the hadoop configurations to adjust them.
But just because you can get 60M/s in scp, then complain only getting 6M/s in the log is not fair to hadoop. You one reducer needs to copy data from all the mappers, concurrently, makes it impossible to reach the same speed as one to one point network transfer speed.
The reducer stage is normally longer than map stage, as data HAS to be transferred through network.
But in word count example, the data needs to be transferred should be very small. You can ask the following question by yourself:
1) Should I use combiner in this case? (Yes, for word count, it reduces the data needs to be transferred).2) Do I use all the reducers I can use, if my cluster is under utilized and I want my job to finish fast?3) Can I add more threads in the task tracker to help? You need to dig into your log to find out if your mapper or reducer are waiting for the thread from thread pool.
Yong

Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks configuration I´ve been able to make the map phase run on 22 minutes.. However the reduce phase (which consists of a single job) stucks at some points making the whole job take more than 40 minutes. Looking at the logs, I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me think it´s having difficulty transferring the parts from the map nodes. Is my interpretation correct on this ? The trasnfer rate is waaay too slow if compared to scp file transfer between the hosts (10 times slower). Any takes on why ?

Regards,

Renato Moutinho

RE: Reduce phase of wordcount

Posted by java8964 <ja...@hotmail.com>.

Don't be confused by 6.03 MB/s.
The relationship between mapper and reducer is M to N relationship, which means the mapper could send its data to all reducers, and one reducer could receive its input from all mappers.
There could be a lot of reasons why you think the reduce copying phase is too slow. It could be the mappers are still running, there is no data generated for reducer to copy yet; or there is no enough threads in either mapper or reducer to utilize remaining cpu/memory/network bandwidth. You can google the hadoop configurations to adjust them.
But just because you can get 60M/s in scp, then complain only getting 6M/s in the log is not fair to hadoop. You one reducer needs to copy data from all the mappers, concurrently, makes it impossible to reach the same speed as one to one point network transfer speed.
The reducer stage is normally longer than map stage, as data HAS to be transferred through network.
But in word count example, the data needs to be transferred should be very small. You can ask the following question by yourself:
1) Should I use combiner in this case? (Yes, for word count, it reduces the data needs to be transferred).2) Do I use all the reducers I can use, if my cluster is under utilized and I want my job to finish fast?3) Can I add more threads in the task tracker to help? You need to dig into your log to find out if your mapper or reducer are waiting for the thread from thread pool.
Yong

Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks configuration I´ve been able to make the map phase run on 22 minutes.. However the reduce phase (which consists of a single job) stucks at some points making the whole job take more than 40 minutes. Looking at the logs, I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me think it´s having difficulty transferring the parts from the map nodes. Is my interpretation correct on this ? The trasnfer rate is waaay too slow if compared to scp file transfer between the hosts (10 times slower). Any takes on why ?

Regards,

Renato Moutinho

RE: Reduce phase of wordcount

Posted by java8964 <ja...@hotmail.com>.

Don't be confused by 6.03 MB/s.
The relationship between mapper and reducer is M to N relationship, which means the mapper could send its data to all reducers, and one reducer could receive its input from all mappers.
There could be a lot of reasons why you think the reduce copying phase is too slow. It could be the mappers are still running, there is no data generated for reducer to copy yet; or there is no enough threads in either mapper or reducer to utilize remaining cpu/memory/network bandwidth. You can google the hadoop configurations to adjust them.
But just because you can get 60M/s in scp, then complain only getting 6M/s in the log is not fair to hadoop. You one reducer needs to copy data from all the mappers, concurrently, makes it impossible to reach the same speed as one to one point network transfer speed.
The reducer stage is normally longer than map stage, as data HAS to be transferred through network.
But in word count example, the data needs to be transferred should be very small. You can ask the following question by yourself:
1) Should I use combiner in this case? (Yes, for word count, it reduces the data needs to be transferred).2) Do I use all the reducers I can use, if my cluster is under utilized and I want my job to finish fast?3) Can I add more threads in the task tracker to help? You need to dig into your log to find out if your mapper or reducer are waiting for the thread from thread pool.
Yong

Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks configuration I´ve been able to make the map phase run on 22 minutes.. However the reduce phase (which consists of a single job) stucks at some points making the whole job take more than 40 minutes. Looking at the logs, I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me think it´s having difficulty transferring the parts from the map nodes. Is my interpretation correct on this ? The trasnfer rate is waaay too slow if compared to scp file transfer between the hosts (10 times slower). Any takes on why ?

Regards,

Renato Moutinho

RE: Reduce phase of wordcount

Posted by java8964 <ja...@hotmail.com>.

Don't be confused by 6.03 MB/s.
The relationship between mapper and reducer is M to N relationship, which means the mapper could send its data to all reducers, and one reducer could receive its input from all mappers.
There could be a lot of reasons why you think the reduce copying phase is too slow. It could be the mappers are still running, there is no data generated for reducer to copy yet; or there is no enough threads in either mapper or reducer to utilize remaining cpu/memory/network bandwidth. You can google the hadoop configurations to adjust them.
But just because you can get 60M/s in scp, then complain only getting 6M/s in the log is not fair to hadoop. You one reducer needs to copy data from all the mappers, concurrently, makes it impossible to reach the same speed as one to one point network transfer speed.
The reducer stage is normally longer than map stage, as data HAS to be transferred through network.
But in word count example, the data needs to be transferred should be very small. You can ask the following question by yourself:
1) Should I use combiner in this case? (Yes, for word count, it reduces the data needs to be transferred).2) Do I use all the reducers I can use, if my cluster is under utilized and I want my job to finish fast?3) Can I add more threads in the task tracker to help? You need to dig into your log to find out if your mapper or reducer are waiting for the thread from thread pool.
Yong

Date: Fri, 3 Oct 2014 18:40:16 -0300
Subject: Reduce phase of wordcount
From: renato.moutinho@gmail.com
To: user@hadoop.apache.org

Hi people,

    I´m doing some experiments with hadoop 1.2.1 running the wordcount sample on an 8 nodes cluster (master + 7 slaves). Tuning the tasks configuration I´ve been able to make the map phase run on 22 minutes.. However the reduce phase (which consists of a single job) stucks at some points making the whole job take more than 40 minutes. Looking at the logs, I´ve seen several lines stuck at copy on different moments, like this:

2014-10-03 18:26:34,717 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:37,736 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:40,754 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >
2014-10-03 18:26:43,772 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201408281149_0019_r_000000_0 0.3302721% reduce > copy (971 of 980 at 6.03 MB/s) >

Eventually the job end, but this information, being repeated, makes me think it´s having difficulty transferring the parts from the map nodes. Is my interpretation correct on this ? The trasnfer rate is waaay too slow if compared to scp file transfer between the hosts (10 times slower). Any takes on why ?

Regards,

Renato Moutinho