You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Elkhan Dadashov (JIRA)" <ji...@apache.org> on 2016/11/08 09:01:58 UTC

[jira] [Resolved] (SPARK-18288) SparkLauncer 2.0.1 version working incosistently in yarn-client mode

     [ https://issues.apache.org/jira/browse/SPARK-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elkhan Dadashov resolved SPARK-18288.
-------------------------------------
    Resolution: Not A Problem

> SparkLauncer 2.0.1 version working incosistently in yarn-client mode
> --------------------------------------------------------------------
>
>                 Key: SPARK-18288
>                 URL: https://issues.apache.org/jira/browse/SPARK-18288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.0.1
>         Environment: I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn cluster. Deploy mode is yarn-client.
>            Reporter: Elkhan Dadashov
>
> I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn cluster. I launch map task which spawns Spark job via SparkLauncher#startApplication().
> Deploy mode is yarn-client. 
> I'm running in Mac laptop.
> I have this snippet of code:
> {code:|borderStyle=solid}
> SparkAppHandle appHandle = sparkLauncher.startApplication();
> while (appHandle.getState() == null || !appHandle.getState().isFinal()) {
>     if (appHandle.getState() != null) {
>         // If the line below is commented, then appState and appId cannot be retrieved.
>         log.info("while: Spark job state is : " + appHandle.getState());
>         if (appHandle.getAppId() != null) {
>             log.info("\t App id: " + appHandle.getAppId() + "\tState: " + appHandle.getState());
>         }
>     }
> }
> {code}
> The above snippet of code works fine, both spark job and the map task which spawns that Spark job successfully completes.
> But if i comment out the red highlighted line, then the Spark job launches and finishes successfully, but the map task hangs for a while (in Running state) and then fails with the exception below.
> I run exact same code in exact same environment except that one line commented out. 
> When the highlighted line is commented out, I even do not see the 2nd log line in the stderr either, it seems appHandle hook never returns back anything (neither app id nor app state), even though spark application starts, runs and finishes successfully. Inside the same stderr, i can see Spark job related logs, and spark job results printed, and application report indicating status.
> You can see the exception below (this is from the stderr of the mapper container which launches Spark job):
> ---
> INFO: Communication exception: java.net.ConnectException: Call From <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection exception: java.net.ConnectException: Connection refused;
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>         at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
>         at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
>         ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure
> INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run
> INFO: Communication exception: java.net.ConnectException: Call From <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>         at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)
>         at com.sun.proxy.$Proxy9.ping(Unknown Source)
>         at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>         at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
>         at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
>         ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo
> INFO: Process Thread Dump: Communication exception
> 10 active threads
> Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727):
>   State: TIMED_WAITING
>   Blocked count: 0
>   Waited count: 79
>   Stack:
>     java.lang.Thread.sleep(Native Method)
>     org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255)
>     org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
>     org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
>     java.lang.Thread.run(Thread.java:745)
> 0 New
> Reply to all



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org