You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by xu cheng <xc...@gmail.com> on 2010/08/27 11:43:20 UTC

questions about the order of the map and reduce and the shuffle error

hello guys:
   I'm doing some experiences on my 3 node virtual machine cluster, one for
namenode and jobtracker while the other tow for datanode and
tasktracker.with a 0.21.0 hadoop
   and when  I 'm running a job ,I got such message


10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%


the reduce runs while the map task hasn't finished!! ( I read from the books
that reduce task runs exactlly after the maps finish!!)

is there something wrong with the cluster or my knowledge?

by the way ,the reduce job stuck while it is in the progress 16.63%, I found
that people on the internet also got this problem but I haven't found the
solution.

however, after some time , after the system report the error message , the
job began to run again! like this


10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%


and it runs pretty well!  does someone know about this?

belows are the message on the console, If the logs are needed ,let me
know.thanks

any suggestions and references are appreciated
best regards
xu








10/08/27 17:28:56 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=300000
10/08/27 17:28:56 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
10/08/27 17:28:56 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/08/27 17:28:56 INFO input.FileInputFormat: Total input paths to process :
1
10/08/27 17:28:56 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
10/08/27 17:28:56 INFO mapreduce.JobSubmitter: number of splits:2
10/08/27 17:28:57 INFO mapreduce.JobSubmitter: adding the following
namenodes' delegation tokens:null
10/08/27 17:28:57 INFO mapreduce.Job: Running job: job_201008271725_0001
10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
10/08/27 17:29:48 INFO mapreduce.Job: Task Id :
attempt_201008271725_0001_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
shuffle in fetcher#1
 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
 at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
bailing-out.
 at
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
 at
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
 at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
 at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
refused
10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
refused
10/08/27 17:29:49 INFO mapreduce.Job:  map 100% reduce 0%
10/08/27 17:29:57 INFO mapreduce.Job: Task Id :
attempt_201008271725_0001_m_000000_0, Status : FAILED
Too many fetch-failures
10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
refused
10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
refused
10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
10/08/27 17:30:33 INFO mapreduce.Job: Job complete: job_201008271725_0001
10/08/27 17:30:33 INFO mapreduce.Job: Counters: 33
 FileInputFormatCounters
  BYTES_READ=76420532
 FileSystemCounters
  FILE_BYTES_READ=155979268
  FILE_BYTES_WRITTEN=239598906
  HDFS_BYTES_READ=76424828
  HDFS_BYTES_WRITTEN=78386951
 Shuffle Errors
  BAD_ID=0
  CONNECTION=0
  IO_ERROR=2
  WRONG_LENGTH=0
  WRONG_MAP=0
  WRONG_REDUCE=0
 Job Counters
  Data-local map tasks=3
  Total time spent by all maps waiting after reserving slots (ms)=0
  Total time spent by all reduces waiting after reserving slots (ms)=0
  SLOTS_MILLIS_MAPS=47747
  SLOTS_MILLIS_REDUCES=77241
  Launched map tasks=3
  Launched reduce tasks=2
 Map-Reduce Framework
  Combine input records=999998
  Combine output records=994179
  Failed Shuffles=1
  GC time elapsed (ms)=667
  Map input records=499999
  Map output bytes=80759850
  Map output records=999998
  Merged Map outputs=2
  Reduce input groups=993814
  Reduce input records=994179
  Reduce output records=993814
  Reduce shuffle bytes=83049802
  Shuffled Maps =2
  Spilled Records=2861309
  SPLIT_RAW_BYTES=200

Re: questions about the order of the map and reduce and the shuffle error

Posted by xu cheng <xc...@gmail.com>.

by the way, there are 2 maps and 1 reduce task
best regards

2010/8/27 xu cheng <xc...@gmail.com>

> hello guys:
>    I'm doing some experiences on my 3 node virtual machine cluster, one for
> namenode and jobtracker while the other tow for datanode and
> tasktracker.with a 0.21.0 hadoop
>    and when  I 'm running a job ,I got such message
>
>
> 10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
> 10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
> 10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
> 10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
>
>
> the reduce runs while the map task hasn't finished!! ( I read from the
> books that reduce task runs exactlly after the maps finish!!)
>
> is there something wrong with the cluster or my knowledge?
>
> by the way ,the reduce job stuck while it is in the progress 16.63%, I found
> that people on the internet also got this problem but I haven't found the
> solution.
>
> however, after some time , after the system report the error message , the
> job began to run again! like this
>
>
> 10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
> 10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
> 10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
>
>
> and it runs pretty well!  does someone know about this?
>
> belows are the message on the console, If the logs are needed ,let me
> know.thanks
>
> any suggestions and references are appreciated
> best regards
> xu
>
>
>
>
>
>
>
>
> 10/08/27 17:28:56 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 10/08/27 17:28:56 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> 10/08/27 17:28:56 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/08/27 17:28:56 INFO input.FileInputFormat: Total input paths to process
> : 1
> 10/08/27 17:28:56 WARN conf.Configuration: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 10/08/27 17:28:56 INFO mapreduce.JobSubmitter: number of splits:2
> 10/08/27 17:28:57 INFO mapreduce.JobSubmitter: adding the following
> namenodes' delegation tokens:null
> 10/08/27 17:28:57 INFO mapreduce.Job: Running job: job_201008271725_0001
> 10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
> 10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
> 10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
> 10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:29:48 INFO mapreduce.Job: Task Id :
> attempt_201008271725_0001_r_000000_0, Status : FAILED
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
> shuffle in fetcher#1
>  at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
>  at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> bailing-out.
>  at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
>  at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
>  at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
> 10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:49 INFO mapreduce.Job:  map 100% reduce 0%
> 10/08/27 17:29:57 INFO mapreduce.Job: Task Id :
> attempt_201008271725_0001_m_000000_0, Status : FAILED
> Too many fetch-failures
> 10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
> 10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
> 10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
> 10/08/27 17:30:33 INFO mapreduce.Job: Job complete: job_201008271725_0001
> 10/08/27 17:30:33 INFO mapreduce.Job: Counters: 33
>  FileInputFormatCounters
>   BYTES_READ=76420532
>  FileSystemCounters
>   FILE_BYTES_READ=155979268
>   FILE_BYTES_WRITTEN=239598906
>   HDFS_BYTES_READ=76424828
>   HDFS_BYTES_WRITTEN=78386951
>  Shuffle Errors
>   BAD_ID=0
>   CONNECTION=0
>   IO_ERROR=2
>   WRONG_LENGTH=0
>   WRONG_MAP=0
>   WRONG_REDUCE=0
>  Job Counters
>   Data-local map tasks=3
>   Total time spent by all maps waiting after reserving slots (ms)=0
>   Total time spent by all reduces waiting after reserving slots (ms)=0
>   SLOTS_MILLIS_MAPS=47747
>   SLOTS_MILLIS_REDUCES=77241
>   Launched map tasks=3
>   Launched reduce tasks=2
>  Map-Reduce Framework
>   Combine input records=999998
>   Combine output records=994179
>   Failed Shuffles=1
>   GC time elapsed (ms)=667
>   Map input records=499999
>   Map output bytes=80759850
>   Map output records=999998
>   Merged Map outputs=2
>   Reduce input groups=993814
>   Reduce input records=994179
>   Reduce output records=993814
>   Reduce shuffle bytes=83049802
>   Shuffled Maps =2
>   Spilled Records=2861309
>   SPLIT_RAW_BYTES=200
>

Re: questions about the order of the map and reduce and the shuffle error

Posted by Harsh J <qw...@gmail.com>.

On Fri, Aug 27, 2010 at 3:13 PM, xu cheng <xc...@gmail.com> wrote:
> hello guys:
>   I'm doing some experiences on my 3 node virtual machine cluster, one for
> namenode and jobtracker while the other tow for datanode and
> tasktracker.with a 0.21.0 hadoop
>   and when  I 'm running a job ,I got such message
>
>
> 10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
> 10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
> 10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
> 10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
>
>
> the reduce runs while the map task hasn't finished!! ( I read from the books
> that reduce task runs exactlly after the maps finish!!)

The reducer operation contains 3 steps - copy/sort/reduce. 33%-33%-33%
roughly. What you're seeing is the copy phase progress at this point.
The 'reduce' phase indeed runs only after all maps finish, that is
true.

>
> is there something wrong with the cluster or my knowledge?
>
> by the way ,the reduce job stuck while it is in the progress 16.63%, I found
> that people on the internet also got this problem but I haven't found the
> solution.
>
> however, after some time , after the system report the error message , the
> job began to run again! like this
>
>
> 10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
> 10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
> 10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
>
>
> and it runs pretty well!  does someone know about this?
>
> belows are the message on the console, If the logs are needed ,let me
> know.thanks
>
> any suggestions and references are appreciated
> best regards
> xu
>
>
>
Are you running a Firewall, and if you are - is it configured properly?
>
>
>
>
>
> 10/08/27 17:28:56 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 10/08/27 17:28:56 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> 10/08/27 17:28:56 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/08/27 17:28:56 INFO input.FileInputFormat: Total input paths to process :
> 1
> 10/08/27 17:28:56 WARN conf.Configuration: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 10/08/27 17:28:56 INFO mapreduce.JobSubmitter: number of splits:2
> 10/08/27 17:28:57 INFO mapreduce.JobSubmitter: adding the following
> namenodes' delegation tokens:null
> 10/08/27 17:28:57 INFO mapreduce.Job: Running job: job_201008271725_0001
> 10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
> 10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
> 10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
> 10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:29:48 INFO mapreduce.Job: Task Id :
> attempt_201008271725_0001_r_000000_0, Status : FAILED
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
> shuffle in fetcher#1
>  at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
>  at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> bailing-out.
>  at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
>  at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
>  at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
> 10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:49 INFO mapreduce.Job:  map 100% reduce 0%
> 10/08/27 17:29:57 INFO mapreduce.Job: Task Id :
> attempt_201008271725_0001_m_000000_0, Status : FAILED
> Too many fetch-failures
> 10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
> 10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
> 10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
> 10/08/27 17:30:33 INFO mapreduce.Job: Job complete: job_201008271725_0001
> 10/08/27 17:30:33 INFO mapreduce.Job: Counters: 33
>  FileInputFormatCounters
>  BYTES_READ=76420532
>  FileSystemCounters
>  FILE_BYTES_READ=155979268
>  FILE_BYTES_WRITTEN=239598906
>  HDFS_BYTES_READ=76424828
>  HDFS_BYTES_WRITTEN=78386951
>  Shuffle Errors
>  BAD_ID=0
>  CONNECTION=0
>  IO_ERROR=2
>  WRONG_LENGTH=0
>  WRONG_MAP=0
>  WRONG_REDUCE=0
>  Job Counters
>  Data-local map tasks=3
>  Total time spent by all maps waiting after reserving slots (ms)=0
>  Total time spent by all reduces waiting after reserving slots (ms)=0
>  SLOTS_MILLIS_MAPS=47747
>  SLOTS_MILLIS_REDUCES=77241
>  Launched map tasks=3
>  Launched reduce tasks=2
>  Map-Reduce Framework
>  Combine input records=999998
>  Combine output records=994179
>  Failed Shuffles=1
>  GC time elapsed (ms)=667
>  Map input records=499999
>  Map output bytes=80759850
>  Map output records=999998
>  Merged Map outputs=2
>  Reduce input groups=993814
>  Reduce input records=994179
>  Reduce output records=993814
>  Reduce shuffle bytes=83049802
>  Shuffled Maps =2
>  Spilled Records=2861309
>  SPLIT_RAW_BYTES=200
>



-- 
Harsh J
www.harshj.com