You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Oren Marmor <or...@infolinks.com> on 2014/01/23 17:11:41 UTC

HA Jobtracker failure

Hi.
We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
We had a problem during which the active node suddenly became standby and
the standby server attempted to start resulting in a java heap space
failure.
any ideas to why the active node turned to standby?

logs attached:
on (original) active node:
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
*2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby*
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
        at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
        at org.apache.hadoop.ipc.Client.call(Client.java:1198)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
        at $Proxy9.getFileInfo(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
        at $Proxy10.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
        at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031

on standby node
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
aykey/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
limitMaxMemForMapTasks, limitMaxMemF
orReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread, tokenRemoverScanI
nterval=60 min(s)
...
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
from ****:32931: output error
2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8023 caught an exception
java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
        at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
        at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
        at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
from ****:60965: output error
2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8023 caught an exception
java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
        at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
        at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
        at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)

thanks
Oren

Re: HA Jobtracker failure

Posted by Oren <or...@taykey.com>.

Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip 
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip 
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip 
task_201401041634_5856_m_000234, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip 
task_201401041634_5856_m_000235, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000147_0' has completed 
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000184_0' has completed 
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000191_0' has completed 
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000199_0' has completed 
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted 
waiting to send params to server
java.lang.InterruptedException
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
         at 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
         at 
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
         at 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
         at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker: 
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying 
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000180_0' has completed 
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false, 
false, true, 4436), rpc version=2, client version=32, 
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
         at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
         at 
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
         at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
         at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
         at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 
Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

         at org.apache.hadoop.ipc.Client.call(Client.java:1225)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
         ... 13 more

2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000187_0' has completed 
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000188_0' has completed 
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000200_0' has completed 
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false, 
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141 
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000143_0' has completed 
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000164_0' has completed 
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000167_0' has completed 
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000176_0' has completed 
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 
fail over attempts. Trying to fail over immediately.

*******************************************************************************************


the standby server doesnt have much data before the change occurred 
since it was on standby mode.

2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
active
2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping 
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous 
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to 
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: 
Scheduler configured with (memSizeForMapSlotOnJT, 
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, 
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Starting expired delegation token remover thread, 
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker: 
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting 
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface 
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it 
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added 
global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty 
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201303170922_6550 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201306220511_6151 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201307090256_0004 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201308140624_6179 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201309011414_6792 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201310030748_7416 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job 
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is 
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration: 
topology.node.switch.mapping.impl is deprecated. Instead, use 
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO 
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store 
activated/configured with retain-time : 3600000 , job-info-dir : 
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,063 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,132 INFO 
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair 
scheduler event log, logging to 
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler: 
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could 
not recover job job_201311290401_6418
java.io.IOException: Exception reading 
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
         at 
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
         at 
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
         at 
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
         at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
         at java.io.DataInputStream.readFully(DataInputStream.java:180)
         at java.io.DataInputStream.readFully(DataInputStream.java:152)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
         ... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,892 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker: 
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: 
Starting RUNNING
2014-01-22 06:52:06,560 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from 
10.20.22.109:32931: output error



On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the 
> JTs? Also, if this is something recurring, it would be nice to monitor 
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com 
> <ma...@infolinks.com>> wrote:
>
>     Hi.
>     We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
>     ubuntu server)
>     We had a problem during which the active node suddenly became
>     standby and the standby server attempted to start resulting in a
>     java heap space failure.
>     any ideas to why the active node turned to standby?
>
>     logs attached:
>     on (original) active node:
>     2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
>     Initializing job_201401041634_5858
>     2014-01-22 06:48:41,289 INFO
>     org.apache.hadoop.mapred.JobInProgress: Initializing
>     job_201401041634_5858
>     *2014-01-22 06:50:27,386 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to standby*
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping pluginDispatcher
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping infoServer
>     2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
>     interrupted waiting to send params to server
>     java.lang.InterruptedException
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>             at
>     java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>             at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>             at
>     org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>             at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>             at $Proxy9.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>             at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>             at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>             at $Proxy10.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>             at
>     org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>             at
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>             at
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>             at
>     java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>             at
>     java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:662)
>     2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50031
>     <http://SelectChannelConnector@0.0.0.0:50031>
>
>     on standby node
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to active
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
>     JobTrackerHAHttpRedirector on port 50030
>     2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50030
>     <http://SelectChannelConnector@0.0.0.0:50030>
>     2014-01-22 06:50:05,198 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>     2014-01-22 06:50:05,201 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
>     previous system directory
>     hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>     aykey/tmp/mapred/system/seq-000000000023
>     2014-01-22 06:50:05,244 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Updating the current master key for generating delegation tokens
>     2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
>     Scheduler configured with (memSizeForMapSlotOnJT,
>     memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>     orReduceTasks) (-1, -1, -1, -1)
>     2014-01-22 06:50:05,248 INFO
>     org.apache.hadoop.util.HostsFileReader: Refreshing hosts
>     (include/exclude) list
>     2014-01-22 06:50:11,839 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Starting expired delegation token remover thread, tokenRemoverScanI
>     nterval=60 min(s)
>     ...
>     2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
>     Starting RUNNING
>     2014-01-22 06:52:06,560 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
>     to active
>     2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
>     ****:32931: output error
>     2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>     2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
>     ****:60965: output error
>     2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
>     thanks
>     Oren
>
>

Re: HA Jobtracker failure

Posted by Siddharth Tiwari <si...@live.com>.

How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well. 

Sent from my iPhone

> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
> 
> (Redirecting to cdh-user, moving user@hadoop to bcc).
> 
> Hi Oren
> 
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
> 
> Thanks
> Karthik
> 
> 
> 
> 
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>> 
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy9.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>>         at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>> 
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 
>> thanks
>> Oren
>

Re: HA Jobtracker failure

Posted by Siddharth Tiwari <si...@live.com>.

How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well. 

Sent from my iPhone

> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
> 
> (Redirecting to cdh-user, moving user@hadoop to bcc).
> 
> Hi Oren
> 
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
> 
> Thanks
> Karthik
> 
> 
> 
> 
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>> 
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy9.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>>         at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>> 
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 
>> thanks
>> Oren
>

Re: HA Jobtracker failure

Posted by Siddharth Tiwari <si...@live.com>.

How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well. 

Sent from my iPhone

> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
> 
> (Redirecting to cdh-user, moving user@hadoop to bcc).
> 
> Hi Oren
> 
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
> 
> Thanks
> Karthik
> 
> 
> 
> 
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>> 
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy9.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>>         at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>> 
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 
>> thanks
>> Oren
>

Re: HA Jobtracker failure

Posted by Oren <or...@taykey.com>.

Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip 
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip 
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip 
task_201401041634_5856_m_000234, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip 
task_201401041634_5856_m_000235, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000147_0' has completed 
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000184_0' has completed 
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000191_0' has completed 
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000199_0' has completed 
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted 
waiting to send params to server
java.lang.InterruptedException
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
         at 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
         at 
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
         at 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
         at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker: 
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying 
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000180_0' has completed 
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false, 
false, true, 4436), rpc version=2, client version=32, 
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
         at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
         at 
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
         at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
         at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
         at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 
Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

         at org.apache.hadoop.ipc.Client.call(Client.java:1225)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
         ... 13 more

2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000187_0' has completed 
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000188_0' has completed 
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000200_0' has completed 
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false, 
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141 
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000143_0' has completed 
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000164_0' has completed 
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000167_0' has completed 
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000176_0' has completed 
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 
fail over attempts. Trying to fail over immediately.

*******************************************************************************************


the standby server doesnt have much data before the change occurred 
since it was on standby mode.

2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
active
2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping 
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous 
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to 
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: 
Scheduler configured with (memSizeForMapSlotOnJT, 
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, 
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Starting expired delegation token remover thread, 
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker: 
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting 
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface 
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it 
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added 
global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty 
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201303170922_6550 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201306220511_6151 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201307090256_0004 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201308140624_6179 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201309011414_6792 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201310030748_7416 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job 
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is 
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration: 
topology.node.switch.mapping.impl is deprecated. Instead, use 
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO 
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store 
activated/configured with retain-time : 3600000 , job-info-dir : 
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,063 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,132 INFO 
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair 
scheduler event log, logging to 
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler: 
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could 
not recover job job_201311290401_6418
java.io.IOException: Exception reading 
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
         at 
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
         at 
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
         at 
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
         at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
         at java.io.DataInputStream.readFully(DataInputStream.java:180)
         at java.io.DataInputStream.readFully(DataInputStream.java:152)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
         ... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,892 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker: 
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: 
Starting RUNNING
2014-01-22 06:52:06,560 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from 
10.20.22.109:32931: output error



On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the 
> JTs? Also, if this is something recurring, it would be nice to monitor 
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com 
> <ma...@infolinks.com>> wrote:
>
>     Hi.
>     We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
>     ubuntu server)
>     We had a problem during which the active node suddenly became
>     standby and the standby server attempted to start resulting in a
>     java heap space failure.
>     any ideas to why the active node turned to standby?
>
>     logs attached:
>     on (original) active node:
>     2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
>     Initializing job_201401041634_5858
>     2014-01-22 06:48:41,289 INFO
>     org.apache.hadoop.mapred.JobInProgress: Initializing
>     job_201401041634_5858
>     *2014-01-22 06:50:27,386 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to standby*
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping pluginDispatcher
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping infoServer
>     2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
>     interrupted waiting to send params to server
>     java.lang.InterruptedException
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>             at
>     java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>             at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>             at
>     org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>             at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>             at $Proxy9.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>             at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>             at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>             at $Proxy10.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>             at
>     org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>             at
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>             at
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>             at
>     java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>             at
>     java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:662)
>     2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50031
>     <http://SelectChannelConnector@0.0.0.0:50031>
>
>     on standby node
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to active
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
>     JobTrackerHAHttpRedirector on port 50030
>     2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50030
>     <http://SelectChannelConnector@0.0.0.0:50030>
>     2014-01-22 06:50:05,198 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>     2014-01-22 06:50:05,201 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
>     previous system directory
>     hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>     aykey/tmp/mapred/system/seq-000000000023
>     2014-01-22 06:50:05,244 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Updating the current master key for generating delegation tokens
>     2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
>     Scheduler configured with (memSizeForMapSlotOnJT,
>     memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>     orReduceTasks) (-1, -1, -1, -1)
>     2014-01-22 06:50:05,248 INFO
>     org.apache.hadoop.util.HostsFileReader: Refreshing hosts
>     (include/exclude) list
>     2014-01-22 06:50:11,839 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Starting expired delegation token remover thread, tokenRemoverScanI
>     nterval=60 min(s)
>     ...
>     2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
>     Starting RUNNING
>     2014-01-22 06:52:06,560 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
>     to active
>     2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
>     ****:32931: output error
>     2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>     2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
>     ****:60965: output error
>     2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
>     thanks
>     Oren
>
>

Re: HA Jobtracker failure

Posted by Siddharth Tiwari <si...@live.com>.

How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well. 

Sent from my iPhone

> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
> 
> (Redirecting to cdh-user, moving user@hadoop to bcc).
> 
> Hi Oren
> 
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
> 
> Thanks
> Karthik
> 
> 
> 
> 
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>> 
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>         at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>         at $Proxy9.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>>         at $Proxy10.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>>         at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>>         at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>> 
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>>         at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 
>> thanks
>> Oren
>

Re: HA Jobtracker failure

Posted by Oren <or...@taykey.com>.

Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip 
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip 
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip 
task_201401041634_5856_m_000234, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip 
task_201401041634_5856_m_000235, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000147_0' has completed 
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000184_0' has completed 
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000191_0' has completed 
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000199_0' has completed 
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted 
waiting to send params to server
java.lang.InterruptedException
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
         at 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
         at 
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
         at 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
         at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker: 
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying 
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000180_0' has completed 
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false, 
false, true, 4436), rpc version=2, client version=32, 
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
         at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
         at 
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
         at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
         at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
         at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 
Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

         at org.apache.hadoop.ipc.Client.call(Client.java:1225)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
         ... 13 more

2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000187_0' has completed 
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000188_0' has completed 
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000200_0' has completed 
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false, 
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141 
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000143_0' has completed 
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000164_0' has completed 
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000167_0' has completed 
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000176_0' has completed 
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 
fail over attempts. Trying to fail over immediately.

*******************************************************************************************


the standby server doesnt have much data before the change occurred 
since it was on standby mode.

2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
active
2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping 
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous 
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to 
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: 
Scheduler configured with (memSizeForMapSlotOnJT, 
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, 
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Starting expired delegation token remover thread, 
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker: 
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting 
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface 
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it 
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added 
global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty 
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201303170922_6550 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201306220511_6151 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201307090256_0004 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201308140624_6179 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201309011414_6792 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201310030748_7416 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job 
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is 
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration: 
topology.node.switch.mapping.impl is deprecated. Instead, use 
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO 
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store 
activated/configured with retain-time : 3600000 , job-info-dir : 
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,063 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,132 INFO 
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair 
scheduler event log, logging to 
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler: 
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could 
not recover job job_201311290401_6418
java.io.IOException: Exception reading 
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
         at 
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
         at 
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
         at 
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
         at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
         at java.io.DataInputStream.readFully(DataInputStream.java:180)
         at java.io.DataInputStream.readFully(DataInputStream.java:152)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
         ... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,892 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker: 
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: 
Starting RUNNING
2014-01-22 06:52:06,560 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from 
10.20.22.109:32931: output error



On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the 
> JTs? Also, if this is something recurring, it would be nice to monitor 
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com 
> <ma...@infolinks.com>> wrote:
>
>     Hi.
>     We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
>     ubuntu server)
>     We had a problem during which the active node suddenly became
>     standby and the standby server attempted to start resulting in a
>     java heap space failure.
>     any ideas to why the active node turned to standby?
>
>     logs attached:
>     on (original) active node:
>     2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
>     Initializing job_201401041634_5858
>     2014-01-22 06:48:41,289 INFO
>     org.apache.hadoop.mapred.JobInProgress: Initializing
>     job_201401041634_5858
>     *2014-01-22 06:50:27,386 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to standby*
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping pluginDispatcher
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping infoServer
>     2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
>     interrupted waiting to send params to server
>     java.lang.InterruptedException
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>             at
>     java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>             at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>             at
>     org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>             at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>             at $Proxy9.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>             at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>             at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>             at $Proxy10.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>             at
>     org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>             at
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>             at
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>             at
>     java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>             at
>     java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:662)
>     2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50031
>     <http://SelectChannelConnector@0.0.0.0:50031>
>
>     on standby node
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to active
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
>     JobTrackerHAHttpRedirector on port 50030
>     2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50030
>     <http://SelectChannelConnector@0.0.0.0:50030>
>     2014-01-22 06:50:05,198 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>     2014-01-22 06:50:05,201 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
>     previous system directory
>     hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>     aykey/tmp/mapred/system/seq-000000000023
>     2014-01-22 06:50:05,244 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Updating the current master key for generating delegation tokens
>     2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
>     Scheduler configured with (memSizeForMapSlotOnJT,
>     memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>     orReduceTasks) (-1, -1, -1, -1)
>     2014-01-22 06:50:05,248 INFO
>     org.apache.hadoop.util.HostsFileReader: Refreshing hosts
>     (include/exclude) list
>     2014-01-22 06:50:11,839 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Starting expired delegation token remover thread, tokenRemoverScanI
>     nterval=60 min(s)
>     ...
>     2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
>     Starting RUNNING
>     2014-01-22 06:52:06,560 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
>     to active
>     2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
>     ****:32931: output error
>     2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>     2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
>     ****:60965: output error
>     2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
>     thanks
>     Oren
>
>

Re: HA Jobtracker failure

Posted by Oren <or...@taykey.com>.

Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip 
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip 
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip 
task_201401041634_5856_m_000234, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. Use 
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding 
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip 
task_201401041634_5856_m_000235, for tracker 
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000147_0' has completed 
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000184_0' has completed 
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000191_0' has completed 
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000199_0' has completed 
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted 
waiting to send params to server
java.lang.InterruptedException
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
         at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
         at 
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
         at 
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.getFileInfo(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
         at 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
         at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker: 
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker: 
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying 
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000180_0' has completed 
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false, 
false, true, 4436), rpc version=2, client version=32, 
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job 
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
         at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
         at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
         at 
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
         at 
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
         at 
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
         at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
         at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
         at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 
Parent directory doesn't exist: 
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
         at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
         at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
         at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

         at org.apache.hadoop.ipc.Client.call(Client.java:1225)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
         at $Proxy9.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
         at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
         at $Proxy10.create(Unknown Source)
         at 
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
         ... 13 more

2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000187_0' has completed 
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000188_0' has completed 
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000200_0' has completed 
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false, 
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141 
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
         at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
         at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
         at 
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000143_0' has completed 
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000164_0' has completed 
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000167_0' has completed 
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress: 
Task 'attempt_201401041634_5856_m_000176_0' has completed 
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN 
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while 
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1 
fail over attempts. Trying to fail over immediately.

*******************************************************************************************


the standby server doesnt have much data before the change occurred 
since it was on standby mode.

2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to 
active
2014-01-22 06:50:05,010 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping 
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO 
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous 
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to 
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: 
Scheduler configured with (memSizeForMapSlotOnJT, 
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, 
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Starting expired delegation token remover thread, 
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: 
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker: 
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting 
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface 
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it 
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added 
global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added 
filter static_user_filter 
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) 
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty 
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started 
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - 
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker: 
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201303170922_6550 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201306220511_6151 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201307090256_0004 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201308140624_6179 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201309011414_6792 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201310030748_7416 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5858 does not have valid info/token file so ignoring 
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job 
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is 
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration: 
topology.node.switch.mapping.impl is deprecated. Instead, use 
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO 
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store 
activated/configured with retain-time : 3600000 , job-info-dir : 
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,063 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:35,132 INFO 
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair 
scheduler event log, logging to 
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler: 
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could 
not recover job job_201311290401_6418
java.io.IOException: Exception reading 
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
         at 
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
         at 
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
         at 
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
         at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
         at java.io.DataInputStream.readFully(DataInputStream.java:180)
         at java.io.DataInputStream.readFully(DataInputStream.java:152)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
         at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
         ... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker: 
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,892 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress: 
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job 
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: 
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress: 
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader: 
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker: 
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO 
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker 
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: 
Starting RUNNING
2014-01-22 06:52:06,560 INFO 
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server 
Responder, call 
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from 
10.20.22.109:32931: output error



On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the 
> JTs? Also, if this is something recurring, it would be nice to monitor 
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com 
> <ma...@infolinks.com>> wrote:
>
>     Hi.
>     We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
>     ubuntu server)
>     We had a problem during which the active node suddenly became
>     standby and the standby server attempted to start resulting in a
>     java heap space failure.
>     any ideas to why the active node turned to standby?
>
>     logs attached:
>     on (original) active node:
>     2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
>     Initializing job_201401041634_5858
>     2014-01-22 06:48:41,289 INFO
>     org.apache.hadoop.mapred.JobInProgress: Initializing
>     job_201401041634_5858
>     *2014-01-22 06:50:27,386 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to standby*
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping pluginDispatcher
>     2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
>     Stopping infoServer
>     2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
>     interrupted waiting to send params to server
>     java.lang.InterruptedException
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>             at
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>             at
>     java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>             at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>             at
>     org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>             at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>             at $Proxy9.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>             at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>             at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>             at java.lang.reflect.Method.invoke(Method.java:597)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>             at $Proxy10.getFileInfo(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>             at
>     org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>             at
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>             at
>     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>             at
>     java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>             at
>     java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>             at
>     java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>             at java.lang.Thread.run(Thread.java:662)
>     2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50031
>     <http://SelectChannelConnector@0.0.0.0:50031>
>
>     on standby node
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
>     Transitioning to active
>     2014-01-22 06:50:05,010 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
>     JobTrackerHAHttpRedirector on port 50030
>     2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
>     SelectChannelConnector@0.0.0.0:50030
>     <http://SelectChannelConnector@0.0.0.0:50030>
>     2014-01-22 06:50:05,198 INFO
>     org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>     2014-01-22 06:50:05,201 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
>     previous system directory
>     hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>     aykey/tmp/mapred/system/seq-000000000023
>     2014-01-22 06:50:05,244 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Updating the current master key for generating delegation tokens
>     2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
>     Scheduler configured with (memSizeForMapSlotOnJT,
>     memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>     orReduceTasks) (-1, -1, -1, -1)
>     2014-01-22 06:50:05,248 INFO
>     org.apache.hadoop.util.HostsFileReader: Refreshing hosts
>     (include/exclude) list
>     2014-01-22 06:50:11,839 INFO
>     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>     Starting expired delegation token remover thread, tokenRemoverScanI
>     nterval=60 min(s)
>     ...
>     2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
>     Starting RUNNING
>     2014-01-22 06:52:06,560 INFO
>     org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
>     to active
>     2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
>     ****:32931: output error
>     2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>     2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
>     Server Responder, call
>     org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
>     ****:60965: output error
>     2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
>     Server handler 0 on 8023 caught an exception
>     java.nio.channels.ClosedChannelException
>             at
>     sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>             at
>     sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>             at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>             at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>             at
>     org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>             at
>     org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
>     thanks
>     Oren
>
>

Re: HA Jobtracker failure

Posted by Karthik Kambatla <ka...@cloudera.com>.

(Redirecting to cdh-user, moving user@hadoop to bcc).

Hi Oren

Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.

Thanks
Karthik




On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:

> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy9.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>         at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>

Re: HA Jobtracker failure

Posted by Karthik Kambatla <ka...@cloudera.com>.

(Redirecting to cdh-user, moving user@hadoop to bcc).

Hi Oren

Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.

Thanks
Karthik




On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:

> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy9.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>         at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>

Re: HA Jobtracker failure

Posted by Karthik Kambatla <ka...@cloudera.com>.

(Redirecting to cdh-user, moving user@hadoop to bcc).

Hi Oren

Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.

Thanks
Karthik




On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:

> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy9.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>         at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>

Re: HA Jobtracker failure

Posted by Karthik Kambatla <ka...@cloudera.com>.

(Redirecting to cdh-user, moving user@hadoop to bcc).

Hi Oren

Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.

Thanks
Karthik




On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:

> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>         at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>         at $Proxy9.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>         at $Proxy10.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>         at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
>         at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>         at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>         at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>         at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>