You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ivanwick <gi...@git.apache.org> on 2014/04/03 08:14:30 UTC

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

GitHub user ivanwick opened a pull request:

    https://github.com/apache/spark/pull/311

    Set spark.executor.uri from environment variable (needed by Mesos)

    The Mesos backend uses this property when setting up a slave process.  It is similarly set in the Scala repl (org.apache.spark.repl.SparkILoop), but I couldn't find any analogous for pyspark.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ivanwick/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #311
    
----
commit da0c3e4b26c3505ac48f92fac741e14578cee454
Author: Ivan Wick <iv...@gmail.com>
Date:   2014-04-03T03:41:10Z

    Set spark.executor.uri from environment variable (needed by Mesos)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39625955
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-40160948
  
    Thanks Ivan, I've merged this in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39625956
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13785/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39623139
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39623311
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39594034
  
    Jenkins, test this please.
    
    Good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39623318
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39416318
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39645445
  
    @ivanwick what is the symptom when this is not set correctly? If there is an exception or stacktrace it would be helpful to know what it does, so that other people who run into this problem can figure out that this is the fix for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Set spark.executor.uri from environment variab...

Posted by ivanwick <gi...@git.apache.org>.
Github user ivanwick commented on the pull request:

    https://github.com/apache/spark/pull/311#issuecomment-39755988
  
    This patch fixes a bug with PySpark shell running on Mesos.
    
    Without the spark.executor.uri property, PySpark reports lost tasks because the slave is looking for the spark-executor in the wrong path and can never start it.  It logs several  "Lost TID" and "Executor lost", while the scheduler re-queues the lost tasks.  They again fail for the same reason, finally ending with:
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/pyspark/rdd.py", line 539, in sum
        return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/pyspark/rdd.py", line 505, in reduce
        vals = self.mapPartitions(func).collect()
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/pyspark/rdd.py", line 469, in collect
        bytesInJava = self._jrdd.collect().iterator()
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 537, in __call__
      File "/opt/spark/spark-0.9.0-incubating-bin-cdh4/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py", line 300, in get_return_value
    py4j.protocol.Py4JJavaError14/04/05 14:10:48 INFO TaskSetManager: Re-queueing tasks for 201404020012-1174907072-5050-22936-8 from TaskSet 0.0
    14/04/05 14:10:48 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    : An error occurred while calling o21.collect.
    : org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 4 times (most recent failure: unknown)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
    	at scala.Option.foreach(Option.scala:236)
    	at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
    	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    
    ```
    
    The stderr of each slave in the Mesos framework reports:
    ```
    sh: 1: /opt/spark/spark-0.9.0-incubating-bin-cdh4/sbin/spark-executor: not found
    ```
    because this path doesn't exist on the slave nodes (this happens to be the path where it's installed on the head node).
    
    When spark.executor.uri is set, as it is with the Scala repl, Mesos is able to download the Spark dist package and run it from the framework temp directory on the slave.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---