You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2015/10/08 00:46:26 UTC

[jira] [Commented] (SPARK-10987) yarn-cluster mode misbehaving with netty-based RPC backend

    [ https://issues.apache.org/jira/browse/SPARK-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947738#comment-14947738 ] 

Marcelo Vanzin commented on SPARK-10987:
----------------------------------------

It may not be cluster mode per se; I ran the tests locally and the cluster mode test seems to not be running at all, because there's still a container from the previous test running, stuck at this place:

{noformat}
"main" prio=10 tid=0x00007f1554017800 nid=0x3d2d waiting on condition [0x00007f155d326000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f9893fd0> (a scala.concurrent.impl.Promise$CompletionLatch)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
        at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:99)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:162)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:149)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:250)
        at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
{noformat}


> yarn-cluster mode misbehaving with netty-based RPC backend
> ----------------------------------------------------------
>
>                 Key: SPARK-10987
>                 URL: https://issues.apache.org/jira/browse/SPARK-10987
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 1.6.0
>            Reporter: Marcelo Vanzin
>            Priority: Blocker
>
> YARN running in cluster deploy mode seems to be having issues with the new RPC backend; if you look at unit test runs, tests that run in cluster mode are taking several minutes to run, instead of the more usual 20-30 seconds.
> For example, https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43349/consoleFull:
> {noformat}
> [info] YarnClusterSuite:
> [info] - run Spark in yarn-client mode (13 seconds, 953 milliseconds)
> [info] - run Spark in yarn-cluster mode (6 minutes, 50 seconds)
> [info] - run Spark in yarn-cluster mode unsuccessfully (1 minute, 53 seconds)
> [info] - run Python application in yarn-client mode (21 seconds, 842 milliseconds)
> [info] - run Python application in yarn-cluster mode (7 minutes, 0 seconds)
> [info] - user class path first in client mode (1 minute, 58 seconds)
> [info] - user class path first in cluster mode (4 minutes, 49 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org