You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2014/01/16 00:30:27 UTC

[jira] [Updated] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Abdelnur updated MAPREDUCE-5724:
------------------------------------------

    Attachment: MAPREDUCE-5724.patch

trying to do something like YARN-24 for JHS is a bit more complicated.

Instead, I've taken a different approach:

On startup the JHS will try creating the history directories, if it cannot because the the FS is not available or in safemode will retry for up to 2mins, if it times out, it will then shutdown.

So, instead failing immediately, the JHS will wait for the FS to become avail for a while.

I've hardcoded the 2mins timeout as I don't think we need to introduce a config value for this. If others feel otherwise, I can update the patch with a config prop for it.

> JobHistoryServer does not start if HDFS is not running
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-5724
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Critical
>         Attachments: MAPREDUCE-5724.patch
>
>
> Starting JHS without HDFS running fails with the following error:
> {code}
> STARTUP_MSG:   build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z
> STARTUP_MSG:   java = 1.7.0_45
> ************************************************************/
> 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT]
> 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init
> 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
> Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
> 	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
> 	at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
> 	at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
> 	at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
> 	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> 	at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1102)
> 	at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1514)
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:561)
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:502)
> 	... 8 more
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1377)
> 	... 28 more
> 2014-01-14 16:47:41,713 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.JobHistory failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143)
> 	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207)
> 	at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217)
> Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1359)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
> 	at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722)
> 	at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124)
> 	at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1106)
> 	at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1102)
> 	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> 	at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1102)
> 	at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1514)
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:561)
> 	at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:502)
> 	... 8 more
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:601)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:696)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2700(Client.java:367)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1458)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1377)
> 	... 28 more
> 2014-01-14 16:47:41,714 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Stopping JobHistory
> 2014-01-14 16:47:41,714 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done]
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)