You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/02 12:47:55 UTC
[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

    [ https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849869#comment-15849869 ] 

ASF GitHub Bot commented on TINKERPOP-1271:
-------------------------------------------

GitHub user artem-aliev opened a pull request:

    https://github.com/apache/tinkerpop/pull/555

    TINKERPOP-1271: Refactor SparkContext creation and handling of sc.stop()

        org.apache.tinkerpop.gremlin.spark.structure.Spark is a SparkContext holder for SparkGraphComputer.
        It was refactored to detect external stop calls and recreate SparkContext in that case.
        Context creation process was reordered to make all configuration options take effect.
        Spark.create() methods return created context now.
        The external stop also requires SPARK-18751 fix, that was integrated into Spark 2.1.
        By the way the refactoring and configuration loading part gives effect on all versions.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/artem-aliev/tinkerpop TINKERPOP-1271

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #555
    
----
commit 6bea8a562a7d2d2a940a5cb7db3f2a4ce09f3dac
Author: artemaliev <artem.aliev@gmail,com>
Date:   2017-02-02T12:15:04Z

    TINKERPOP-1271: Refactor SparkContext creation and handling of external sc.stop()
        org.apache.tinkerpop.gremlin.spark.structure.Spark is a SparkContext holder for SparkGraphComputer.
        It was refactored do detect external stop calls and recreate SparkContext in that case.
        Context creation process was reordered to make all configuration options to take effect.
        Spark.create() methods return created context now.
        The external stop also requires SPARK-18751 fix, that was integrated into Spark 2.1.
        By the way the refactoring and configuration loading part gives effect on all versions.

----


> SparkContext should be restarted if Killed and using Persistent Context
> -----------------------------------------------------------------------
>
>                 Key: TINKERPOP-1271
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.2.0-incubating, 3.1.2-incubating
>            Reporter: Russell Spitzer
>
> If the persisted Spark Context is killed by the user via the Spark UI or is terminated for some other error the Gremlin Console/Server is left with a stopped Spark Context. This could be caught and the spark context recreated. Oddly enough if you simply wait the context will "reset" itself or possible get GC'd out of the system and everything works again. 
> ##Repo
> {code}
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend  - Application has been killed. Reason: Master removed our application: KILLED
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 10.150.0.180: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
> // Driver has been killed here via the Master UI
> gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
> ==>hadoopgraph[gryoinputformat->gryooutputformat]
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Full trace from TP
> {code}
> 	at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
> 	at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
> 	at org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> 	at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
> 	at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
> 	at org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
> 	at org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
> 	at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
> 	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> If we wait a certain amount of time for some reason everything starts working again
> {code}
> ERROR org.apache.spark.rpc.netty.Inbox  - Ignoring error
> org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: KILLED
> 	at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
> 	at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
> 	at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
> 	at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
> 	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
> 	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> 	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> 	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> WARN  org.apache.spark.rpc.netty.NettyRpcEnv  - Ignored message: true
> WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
> WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)