You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Oren Marmor <or...@infolinks.com> on 2014/01/23 17:11:41 UTC
HA Jobtracker failure
Hi.
We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
We had a problem during which the active node suddenly became standby and
the standby server attempted to start resulting in a java heap space
failure.
any ideas to why the active node turned to standby?
logs attached:
on (original) active node:
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
*2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby*
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
at org.apache.hadoop.ipc.Client.call(Client.java:1198)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031
on standby node
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
aykey/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
limitMaxMemForMapTasks, limitMaxMemF
orReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread, tokenRemoverScanI
nterval=60 min(s)
...
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
from ****:32931: output error
2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8023 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
from ****:60965: output error
2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8023 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
thanks
Oren
Re: HA Jobtracker failure
Posted by Oren <or...@taykey.com>.
Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip
task_201401041634_5856_m_000234, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip
task_201401041634_5856_m_000235, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000147_0' has completed
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000184_0' has completed
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000191_0' has completed
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000199_0' has completed
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
at org.apache.hadoop.ipc.Client.call(Client.java:1198)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker:
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker:
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker:
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000180_0' has completed
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false,
false, true, 4436), rpc version=2, client version=32,
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
at
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
at
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.create(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
... 13 more
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000187_0' has completed
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000188_0' has completed
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000200_0' has completed
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false,
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000143_0' has completed
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000164_0' has completed
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000167_0' has completed
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000176_0' has completed
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1
fail over attempts. Trying to fail over immediately.
*******************************************************************************************
the standby server doesnt have much data before the change occurred
since it was on standby mode.
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
Scheduler configured with (memSizeForMapSlotOnJT,
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks,
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker:
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added
global filter 'safety'
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201303170922_6550 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201306220511_6151 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201307090256_0004 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201308140624_6179 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201309011414_6792 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201310030748_7416 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration:
topology.node.switch.mapping.impl is deprecated. Instead, use
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store
activated/configured with retain-time : 3600000 , job-info-dir :
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,063 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,132 INFO
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair
scheduler event log, logging to
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler:
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could
not recover job job_201311290401_6418
java.io.IOException: Exception reading
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
at
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
at
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
at
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,892 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker:
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
Starting RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
10.20.22.109:32931: output error
On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the
> JTs? Also, if this is something recurring, it would be nice to monitor
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com
> <ma...@infolinks.com>> wrote:
>
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
> ubuntu server)
> We had a problem during which the active node suddenly became
> standby and the standby server attempted to start resulting in a
> java heap space failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO
> org.apache.hadoop.mapred.JobInProgress: Initializing
> job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
> interrupted waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at
> org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
> <http://SelectChannelConnector@0.0.0.0:50031>
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> <http://SelectChannelConnector@0.0.0.0:50030>
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
> previous system directory
> hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT,
> memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO
> org.apache.hadoop.util.HostsFileReader: Refreshing hosts
> (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
> Starting RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
> to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
> ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
> ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
>
Re: HA Jobtracker failure
Posted by Siddharth Tiwari <si...@live.com>.
How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well.
Sent from my iPhone
> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
>
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>>
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy9.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>> at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>>
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>>
>> thanks
>> Oren
>
Re: HA Jobtracker failure
Posted by Siddharth Tiwari <si...@live.com>.
How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well.
Sent from my iPhone
> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
>
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>>
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy9.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>> at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>>
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>>
>> thanks
>> Oren
>
Re: HA Jobtracker failure
Posted by Siddharth Tiwari <si...@live.com>.
How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well.
Sent from my iPhone
> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
>
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>>
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy9.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>> at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>>
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>>
>> thanks
>> Oren
>
Re: HA Jobtracker failure
Posted by Oren <or...@taykey.com>.
Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip
task_201401041634_5856_m_000234, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip
task_201401041634_5856_m_000235, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000147_0' has completed
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000184_0' has completed
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000191_0' has completed
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000199_0' has completed
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
at org.apache.hadoop.ipc.Client.call(Client.java:1198)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker:
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker:
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker:
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000180_0' has completed
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false,
false, true, 4436), rpc version=2, client version=32,
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
at
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
at
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.create(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
... 13 more
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000187_0' has completed
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000188_0' has completed
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000200_0' has completed
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false,
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000143_0' has completed
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000164_0' has completed
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000167_0' has completed
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000176_0' has completed
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1
fail over attempts. Trying to fail over immediately.
*******************************************************************************************
the standby server doesnt have much data before the change occurred
since it was on standby mode.
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
Scheduler configured with (memSizeForMapSlotOnJT,
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks,
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker:
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added
global filter 'safety'
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201303170922_6550 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201306220511_6151 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201307090256_0004 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201308140624_6179 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201309011414_6792 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201310030748_7416 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration:
topology.node.switch.mapping.impl is deprecated. Instead, use
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store
activated/configured with retain-time : 3600000 , job-info-dir :
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,063 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,132 INFO
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair
scheduler event log, logging to
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler:
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could
not recover job job_201311290401_6418
java.io.IOException: Exception reading
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
at
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
at
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
at
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,892 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker:
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
Starting RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
10.20.22.109:32931: output error
On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the
> JTs? Also, if this is something recurring, it would be nice to monitor
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com
> <ma...@infolinks.com>> wrote:
>
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
> ubuntu server)
> We had a problem during which the active node suddenly became
> standby and the standby server attempted to start resulting in a
> java heap space failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO
> org.apache.hadoop.mapred.JobInProgress: Initializing
> job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
> interrupted waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at
> org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
> <http://SelectChannelConnector@0.0.0.0:50031>
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> <http://SelectChannelConnector@0.0.0.0:50030>
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
> previous system directory
> hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT,
> memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO
> org.apache.hadoop.util.HostsFileReader: Refreshing hosts
> (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
> Starting RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
> to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
> ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
> ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
>
Re: HA Jobtracker failure
Posted by Siddharth Tiwari <si...@live.com>.
How have you implemented the failover ? Also can you attach JTHA logs ? If you hav implemented it using. Zkfc, it would be interesting to look in zookeeper logs as well.
Sent from my iPhone
> On Jan 27, 2014, at 3:00 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
>
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the JTs? Also, if this is something recurring, it would be nice to monitor the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
>> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
>> Hi.
>> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu server)
>> We had a problem during which the active node suddenly became standby and the standby server attempted to start resulting in a java heap space failure.
>> any ideas to why the active node turned to standby?
>>
>> logs attached:
>> on (original) active node:
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201401041634_5858
>> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201401041634_5858
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to standby
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping pluginDispatcher
>> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping infoServer
>> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted waiting to send params to server
>> java.lang.InterruptedException
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
>> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
>> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> at $Proxy9.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> at $Proxy10.getFileInfo(Unknown Source)
>> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
>> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
>> at org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50031
>>
>> on standby node
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to active
>> 2014-01-22 06:50:05,010 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping JobTrackerHAHttpRedirector on port 50030
>> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:50030
>> 2014-01-22 06:50:05,198 INFO org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
>> 2014-01-22 06:50:05,201 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
>> aykey/tmp/mapred/system/seq-000000000023
>> 2014-01-22 06:50:05,244 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
>> orReduceTasks) (-1, -1, -1, -1)
>> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
>> 2014-01-22 06:50:11,839 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanI
>> nterval=60 min(s)
>> ...
>> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING
>> 2014-01-22 06:52:06,560 INFO org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
>> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from ****:32931: output error
>> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from ****:60965: output error
>> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8023 caught an exception
>> java.nio.channels.ClosedChannelException
>> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
>> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
>> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
>> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
>> at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
>> at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>>
>> thanks
>> Oren
>
Re: HA Jobtracker failure
Posted by Oren <or...@taykey.com>.
Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip
task_201401041634_5856_m_000234, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip
task_201401041634_5856_m_000235, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000147_0' has completed
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000184_0' has completed
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000191_0' has completed
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000199_0' has completed
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
at org.apache.hadoop.ipc.Client.call(Client.java:1198)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker:
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker:
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker:
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000180_0' has completed
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false,
false, true, 4436), rpc version=2, client version=32,
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
at
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
at
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.create(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
... 13 more
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000187_0' has completed
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000188_0' has completed
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000200_0' has completed
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false,
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000143_0' has completed
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000164_0' has completed
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000167_0' has completed
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000176_0' has completed
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1
fail over attempts. Trying to fail over immediately.
*******************************************************************************************
the standby server doesnt have much data before the change occurred
since it was on standby mode.
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
Scheduler configured with (memSizeForMapSlotOnJT,
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks,
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker:
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added
global filter 'safety'
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201303170922_6550 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201306220511_6151 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201307090256_0004 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201308140624_6179 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201309011414_6792 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201310030748_7416 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration:
topology.node.switch.mapping.impl is deprecated. Instead, use
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store
activated/configured with retain-time : 3600000 , job-info-dir :
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,063 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,132 INFO
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair
scheduler event log, logging to
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler:
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could
not recover job job_201311290401_6418
java.io.IOException: Exception reading
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
at
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
at
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
at
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,892 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker:
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
Starting RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
10.20.22.109:32931: output error
On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the
> JTs? Also, if this is something recurring, it would be nice to monitor
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com
> <ma...@infolinks.com>> wrote:
>
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
> ubuntu server)
> We had a problem during which the active node suddenly became
> standby and the standby server attempted to start resulting in a
> java heap space failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO
> org.apache.hadoop.mapred.JobInProgress: Initializing
> job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
> interrupted waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at
> org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
> <http://SelectChannelConnector@0.0.0.0:50031>
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> <http://SelectChannelConnector@0.0.0.0:50030>
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
> previous system directory
> hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT,
> memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO
> org.apache.hadoop.util.HostsFileReader: Refreshing hosts
> (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
> Starting RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
> to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
> ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
> ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
>
Re: HA Jobtracker failure
Posted by Oren <or...@taykey.com>.
Hi Karthik.
the logs attached,
these are the logs from the active JT
2014-01-22 06:48:18,952 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5858: nMaps=1 nReduces=1 max=-1
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000237_0' to tip
task_201401041634_5856_m_000237, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000237
2014-01-22 06:48:18,953 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000239_0' to tip
task_201401041634_5856_m_000239, for trac
ker 'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,953 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_201401041634_5856_m_000239
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000234_0' to tip
task_201401041634_5856_m_000234, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000234
2014-01-22 06:48:18,954 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobTracker: Adding
task (MAP) 'attempt_201401041634_5856_m_000235_0' to tip
task_201401041634_5856_m_000235, for tracker
'tracker_ip-10-20-22-122.tk.com:localhost/127.0.0.1:35082'
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing rack-local task task_201401041634_5856_m_000235
2014-01-22 06:48:18,954 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000147_0' has completed
task_201401041634_5856_m_000147 successfully.
2014-01-22 06:48:24,520 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000184_0' has completed
task_201401041634_5856_m_000184 successfully.
2014-01-22 06:48:30,073 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000191_0' has completed
task_201401041634_5856_m_000191 successfully.
2014-01-22 06:48:30,075 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000199_0' has completed
task_201401041634_5856_m_000199 successfully.
2014-01-22 06:48:41,288 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5858
2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5858
2014-01-22 06:50:27,386 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
standby
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping pluginDispatcher
2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
Stopping infoServer
2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
at org.apache.hadoop.ipc.Client.call(Client.java:1198)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50031
2014-01-22 06:51:55,638 INFO org.apache.hadoop.mapred.JobTracker:
Stopping interTrackerServer
2014-01-22 06:51:55,638 INFO org.apache.hadoop.ipc.Server: Stopping
server on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.mapred.JobTracker:
Stopping expireTrackers
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server listener on 8022
2014-01-22 06:51:55,639 INFO org.apache.hadoop.ipc.Server: Stopping IPC
Server Responder
2014-01-22 06:51:55,640 INFO org.apache.hadoop.mapred.JobTracker:
Stopped interTrackerServer
2014-01-22 06:52:56,306 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB. Trying
to fail over immediately.
2014-01-22 06:52:56,338 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000180_0' has completed
task_201401041634_5856_m_000180 successfully.
2014-01-22 06:52:56,338 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@78144f29, false,
false, true, 4436), rpc version=2, client version=32,
methodsFingerPrint=-159967141 from 10.20.22.151:37362: output error
2014-01-22 06:53:02,785 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:02,786 ERROR org.apache.hadoop.mapred.JobTracker: Job
initialization failed:
java.io.FileNotFoundException: Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1301)
at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1242)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:301)
at
org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:79)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1055)
at
org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:1031)
at
org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3505)
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:725)
at
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3775)
at
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:90)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
Parent directory doesn't exist:
/tmp/mapred/system/seq-000000000022/job_201401041634_5858
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1710)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1831)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1753)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1731)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:419)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.create(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.create(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)
... 13 more
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000187_0' has completed
task_201401041634_5856_m_000187 successfully.
2014-01-22 06:53:02,787 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000188_0' has completed
task_201401041634_5856_m_000188 successfully.
2014-01-22 06:53:02,788 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000200_0' has completed
task_201401041634_5856_m_000200 successfully.
2014-01-22 06:53:09,255 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@3f713050, false,
false, true, 77
20), rpc version=2, client version=32, methodsFingerPrint=-159967141
from 10.20.22.76:37774: output error
2014-01-22 06:53:09,255 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8022 caught an exception
java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
at
org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
2014-01-22 06:53:09,255 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000143_0' has completed
task_201401041634_5856_m_000143 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000164_0' has completed
task_201401041634_5856_m_000164 successfully.
2014-01-22 06:53:09,259 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000167_0' has completed
task_201401041634_5856_m_000167 successfully.
2014-01-22 06:53:09,260 INFO org.apache.hadoop.mapred.JobInProgress:
Task 'attempt_201401041634_5856_m_000176_0' has completed
task_201401041634_5856_m_000176 successfully.
2014-01-22 06:53:09,260 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while
invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 1
fail over attempts. Trying to fail over immediately.
*******************************************************************************************
the standby server doesnt have much data before the change occurred
since it was on standby mode.
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
active
2014-01-22 06:50:05,010 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
JobTrackerHAHttpRedirector on port 50030
2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:05,198 INFO
org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
2014-01-22 06:50:05,201 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
system directory hdfs://***/tmp/mapred/system/seq-000000000022 to
hdfs://***/tmp/mapred/system/seq-000000000023
2014-01-22 06:50:05,244 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
Scheduler configured with (memSizeForMapSlotOnJT,
memSizeForReduceSlotOnJT, limitMaxMemForMapTasks,
limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)
2014-01-22 06:50:11,839 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2014-01-22 06:50:11,852 INFO org.apache.hadoop.mapred.JobTracker:
Starting jobtracker with owner as mapred
2014-01-22 06:50:11,853 INFO org.apache.hadoop.ipc.Server: Starting
Socket Reader #1 for port 8021
2014-01-22 06:50:11,857 WARN org.apache.hadoop.ipc.RPC: Interface
interface org.apache.hadoop.mapred.TaskTrackerManager ignored because it
does not extend VersionedProtocol
2014-01-22 06:50:11,919 INFO org.apache.hadoop.http.HttpServer: Added
global filter 'safety'
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context job
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context static
2014-01-22 06:50:11,920 INFO org.apache.hadoop.http.HttpServer: Added
filter static_user_filter
(class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
to context logs
2014-01-22 06:50:11,922 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50030
2014-01-22 06:50:11,922 INFO org.mortbay.log: jetty-6.1.26.cloudera.2
2014-01-22 06:50:35,574 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50030
2014-01-22 06:50:35,575 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker up at: 8021
2014-01-22 06:50:35,576 INFO org.apache.hadoop.mapred.JobTracker:
JobTracker webserver: 50030
2014-01-22 06:50:42,221 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201303170922_6550 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:48,793 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201306220511_6151 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:50:54,434 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201307090256_0004 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:00,062 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201308140624_6179 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:08,461 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201309011414_6792 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:11,293 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201310030748_7416 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,552 WARN org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5858 does not have valid info/token file so ignoring
for recovery
2014-01-22 06:51:22,574 INFO org.apache.hadoop.mapred.JobHistory: Job
History MaxAge is 2592000000 ms (30.00 days), Cleanup Frequency is
86400000 ms (1.00 days)
2014-01-22 06:51:22,574 WARN org.apache.hadoop.conf.Configuration:
topology.node.switch.mapping.impl is deprecated. Instead, use
net.topology.node.switch.mapping.impl
2014-01-22 06:51:22,575 INFO
org.apache.hadoop.mapred.CompletedJobStatusStore: Completed job store
activated/configured with retain-time : 3600000 , job-info-dir :
/jobtracker/jobsInfo
2014-01-22 06:51:22,577 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,063 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:35,132 INFO
org.apache.hadoop.mapred.FairSchedulerEventLog: Initialized fair
scheduler event log, logging to
/mnt/logs/hadoop-0.20-mapreduce/fairscheduler/hadoop-mapred-fairscheduler-tkamz-hdpm01.log
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.FairScheduler:
Successfully configured FairScheduler
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Starting the recovery process for 2 jobs ...
2014-01-22 06:51:35,156 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201311290401_6418
2014-01-22 06:51:42,024 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:42,025 WARN org.apache.hadoop.mapred.JobTracker: Could
not recover job job_201311290401_6418
java.io.IOException: Exception reading
hdfs://***/tmp/mapred/system/seq-000000000023/job_201311290401_6418/jobToken
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:154)
at
org.apache.hadoop.mapred.JobTracker$RecoveryManager.recover(JobTracker.java:1533)
at
org.apache.hadoop.mapred.JobTracker.offerService(JobTracker.java:2179)
at
org.apache.hadoop.mapred.JobTrackerHADaemon$JobTrackerRunner$1.run(JobTrackerHADaemon.java:198)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:189)
at
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:150)
... 4 more
2014-01-22 06:51:42,026 INFO org.apache.hadoop.mapred.JobTracker:
Submitting job job_201401041634_5856
2014-01-22 06:51:48,515 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,892 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:51:57,906 INFO org.apache.hadoop.mapred.JobInProgress:
job_201401041634_5856: nMaps=240 nReduces=6 max=-1
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201401041634_5856 added successfully for user 'hdfs' to queue 'default'
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery done! Recoverd 1 of 2 jobs.
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Recovery Duration (ms):22755
2014-01-22 06:51:57,911 INFO org.apache.hadoop.mapred.JobTracker:
Refreshing hosts information
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201401041634_5856
2014-01-22 06:52:00,857 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201401041634_5856
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the includes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Setting the excludes file to
2014-01-22 06:52:00,865 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2014-01-22 06:52:00,865 INFO org.apache.hadoop.mapred.JobTracker:
Decommissioning 0 nodes
2014-01-22 06:52:00,865 INFO
org.apache.hadoop.mapred.JobTrackerHADaemon: Waiting for jobtracker
RUNNING state
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2014-01-22 06:52:00,868 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 8021: starting
2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
Starting RUNNING
2014-01-22 06:52:06,560 INFO
org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call
org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
10.20.22.109:32931: output error
On 01/27/2014 11:59 PM, Karthik Kambatla wrote:
> (Redirecting to cdh-user, moving user@hadoop to bcc).
>
> Hi Oren
>
> Can you attach slightly longer versions of the log files on both the
> JTs? Also, if this is something recurring, it would be nice to monitor
> the JT heap usage and GC timeouts using jstat -gcutil <jt-pid>.
>
> Thanks
> Karthik
>
>
>
>
> On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <orenm@infolinks.com
> <ma...@infolinks.com>> wrote:
>
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on
> ubuntu server)
> We had a problem during which the active node suddenly became
> standby and the standby server attempted to start resulting in a
> java heap space failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO
> org.apache.hadoop.mapred.JobInProgress: Initializing
> job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker:
> Stopping infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client:
> interrupted waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at
> org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
> <http://SelectChannelConnector@0.0.0.0:50031>
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol:
> Transitioning to active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> <http://SelectChannelConnector@0.0.0.0:50030>
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming
> previous system directory
> hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT,
> memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO
> org.apache.hadoop.util.HostsFileReader: Refreshing hosts
> (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker:
> Starting RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned
> to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.transitionToActive from
> ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC
> Server Responder, call
> org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from
> ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC
> Server handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
>
Re: HA Jobtracker failure
Posted by Karthik Kambatla <ka...@cloudera.com>.
(Redirecting to cdh-user, moving user@hadoop to bcc).
Hi Oren
Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.
Thanks
Karthik
On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
Re: HA Jobtracker failure
Posted by Karthik Kambatla <ka...@cloudera.com>.
(Redirecting to cdh-user, moving user@hadoop to bcc).
Hi Oren
Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.
Thanks
Karthik
On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
Re: HA Jobtracker failure
Posted by Karthik Kambatla <ka...@cloudera.com>.
(Redirecting to cdh-user, moving user@hadoop to bcc).
Hi Oren
Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.
Thanks
Karthik
On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>
Re: HA Jobtracker failure
Posted by Karthik Kambatla <ka...@cloudera.com>.
(Redirecting to cdh-user, moving user@hadoop to bcc).
Hi Oren
Can you attach slightly longer versions of the log files on both the JTs?
Also, if this is something recurring, it would be nice to monitor the JT
heap usage and GC timeouts using jstat -gcutil <jt-pid>.
Thanks
Karthik
On Thu, Jan 23, 2014 at 8:11 AM, Oren Marmor <or...@infolinks.com> wrote:
> Hi.
> We have two HA Jobtrackers in active/standby mode. (CDH4.2 on ubuntu
> server)
> We had a problem during which the active node suddenly became standby and
> the standby server attempted to start resulting in a java heap space
> failure.
> any ideas to why the active node turned to standby?
>
> logs attached:
> on (original) active node:
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobTracker:
> Initializing job_201401041634_5858
> 2014-01-22 06:48:41,289 INFO org.apache.hadoop.mapred.JobInProgress:
> Initializing job_201401041634_5858
> *2014-01-22 06:50:27,386 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> standby*
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> pluginDispatcher
> 2014-01-22 06:50:27,386 INFO org.apache.hadoop.mapred.JobTracker: Stopping
> infoServer
> 2014-01-22 06:50:44,093 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send params to server
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:979)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
> at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:913)
> at org.apache.hadoop.ipc.Client.call(Client.java:1198)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> at $Proxy9.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:628)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> at $Proxy10.getFileInfo(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1532)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
> at
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol$SystemDirectoryMonitor.run(JobTrackerHAServiceProtocol.java:96)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2014-01-22 06:51:55,637 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50031
>
> on standby node
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioning to
> active
> 2014-01-22 06:50:05,010 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopping
> JobTrackerHAHttpRedirector on port 50030
> 2014-01-22 06:50:05,098 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:50030
> 2014-01-22 06:50:05,198 INFO
> org.apache.hadoop.mapred.JobTrackerHAHttpRedirector: Stopped
> 2014-01-22 06:50:05,201 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Renaming previous
> system directory hdfs://***/tmp/mapred/system/seq-000000000022 to hdfs://t
> aykey/tmp/mapred/system/seq-000000000023
> 2014-01-22 06:50:05,244 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Updating the current master key for generating delegation tokens
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.mapred.JobTracker:
> Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemF
> orReduceTasks) (-1, -1, -1, -1)
> 2014-01-22 06:50:05,248 INFO org.apache.hadoop.util.HostsFileReader:
> Refreshing hosts (include/exclude) list
> 2014-01-22 06:50:11,839 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Starting expired delegation token remover thread, tokenRemoverScanI
> nterval=60 min(s)
> ...
> 2014-01-22 06:52:00,870 INFO org.apache.hadoop.mapred.JobTracker: Starting
> RUNNING
> 2014-01-22 06:52:06,560 INFO
> org.apache.hadoop.mapred.JobTrackerHAServiceProtocol: Transitioned to active
> 2014-01-22 06:52:06,560 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.transitionToActive
> from ****:32931: output error
> 2014-01-22 06:52:06,561 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
> 2014-01-22 06:52:13,168 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus
> from ****:60965: output error
> 2014-01-22 06:52:13,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 8023 caught an exception
> java.nio.channels.ClosedChannelException
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326)
> at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2134)
> at org.apache.hadoop.ipc.Server.access$2000(Server.java:108)
> at
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:931)
> at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:997)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1741)
>
> thanks
> Oren
>