You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Min Zhou (JIRA)" <ji...@apache.org> on 2014/10/16 02:02:59 UTC

[jira] [Created] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster

Min Zhou created MAPREDUCE-6129:
-----------------------------------

             Summary: Job failed due to counter out of limited in MRAppMaster
                 Key: MAPREDUCE-6129
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster
            Reporter: Min Zhou


Lots of of cluster's job use more than 120 counters, those kind of jobs  failed with exception like below
{noformat}
2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120
	at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
	at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
	at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
	at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
	at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
	at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
	at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
	at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
	at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
	at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
	at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
	at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
	at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)

{noformat}

The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. 
If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just  use the default value 120. 

Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits.

I will submitt a patch later.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)