You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/09/08 18:45:48 UTC
[jira] [Commented] (SPARK-10488) No longer possible to create SparkConf in pyspark application

    [ https://issues.apache.org/jira/browse/SPARK-10488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735131#comment-14735131 ] 

Sean Owen commented on SPARK-10488:
-----------------------------------

FWIW I have been using ipython + pyspark successfully as described at http://spark.apache.org/docs/latest/programming-guide.html  Each notebook runs a separate pyspark process.

I'm not an expert on this aspect, but if you are trying to run shell-like processes, you should not create your own contexts. I think you want separate processes and contexts which running different shell processes would give you.

> No longer possible to create SparkConf in pyspark application
> -------------------------------------------------------------
>
>                 Key: SPARK-10488
>                 URL: https://issues.apache.org/jira/browse/SPARK-10488
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.0, 1.4.1
>         Environment: pyspark on ec2 deployed cluster
>            Reporter: Brad Willard
>
> I used to be able to make SparkContext connections directly in ipython notebooks so that each notebook could have different resources on the cluster. This worked perfectly until spark 1.4.x.
> This code worked on all previous version of spark and no longer works
> {code}
> from pyspark import SparkConf, SparkContext
> from pyspark.sql import SQLContext
> cpus = 15
> ram = 5
> conf = SparkConf().set('spark.executor.memory', '%sg' % ram).set('spark.cores.max', str(cpus))
> cluster_url = 'spark://%s:7077' % master
> job_name = 'test'
> sc = SparkContext(cluster_url, job_name, conf=confg)
> {code}
> It errors on the SparkConf() line because you can't even make that object in python now without the SparkContext already created....which makes no sense to me.
> {code}
> ---------------------------------------------------------------------------
> Exception                                 Traceback (most recent call last)
> <ipython-input-4-453520c03f2b> in <module>()
>       5 ram = 5
>       6 
> ----> 7 conf = SparkConf().set('spark.executor.memory', '%sg' % ram).set('spark.cores.max', str(cpus))
>       8 
>       9 cluster_url = 'spark://%s:7077' % master
> /root/spark/python/pyspark/conf.py in __init__(self, loadDefaults, _jvm, _jconf)
>     102         else:
>     103             from pyspark.context import SparkContext
> --> 104             SparkContext._ensure_initialized()
>     105             _jvm = _jvm or SparkContext._jvm
>     106             self._jconf = _jvm.SparkConf(loadDefaults)
> /root/spark/python/pyspark/context.py in _ensure_initialized(cls, instance, gateway)
>     227         with SparkContext._lock:
>     228             if not SparkContext._gateway:
> --> 229                 SparkContext._gateway = gateway or launch_gateway()
>     230                 SparkContext._jvm = SparkContext._gateway.jvm
>     231 
> /root/spark/python/pyspark/java_gateway.py in launch_gateway()
>      87                 callback_socket.close()
>      88         if gateway_port is None:
> ---> 89             raise Exception("Java gateway process exited before sending the driver its port number")
>      90 
>      91         # In Windows, ensure the Java child processes do not linger after Python has exited.
> Exception: Java gateway process exited before sending the driver its port number
> {code}
> I am able to work by setting all the pyspark environmental ipython notebook variables, but then each notebook is forced to have the same resources which isn't great if you run lots of different types of jobs ad hoc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org