You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by jegordon <jg...@gmail.com> on 2015/07/09 23:23:17 UTC

Pyspark not working on yarn-cluster mode

Hi to all,

Is there any way to run pyspark scripts with yarn-cluster mode without using
the spark-submit script? I need it in this way because i will integrate this
code into a django web app.

When i try to run any script in yarn-cluster mode i got the following error
:

org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't
running on a cluster. Deployment to YARN is not supported directly by
SparkContext. Please use spark-submit.


I'm creating the sparkContext in the following way :

        conf = (SparkConf()
            .setMaster("yarn-cluster")
            .setAppName("DataFrameTest"))

        sc = SparkContext(conf = conf)

        #Dataframe code ....

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Pyspark not working on yarn-cluster mode

Posted by ofer <of...@gmail.com>.
I advice you to use livy for this purpose.
Livy works well with yarn and it will decouple spark from your web app.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755p27799.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Pyspark not working on yarn-cluster mode

Posted by Elkhan Dadashov <el...@gmail.com>.
Yes, you can launch (from Java code) pyspark scripts with yarn-cluster mode
without using the spark-submit script.

Check SparkLauncher code in this link
<https://github.com/apache/spark/tree/master/launcher/src/main/java/org/apache/spark/launcher>
. SparkLauncher is not dependent on Spark core jars, so it is very easy to
integrate it into your project.

Code example for launching Spark job without spark-submit script:

Process spark = new SparkLauncher().setSparkHome("path_to_spark")

.setAppName(pythonScriptName).setMaster("yarn-cluster")

.setAppResource(sparkScriptPath.toString()).addAppArgs(params)

.addPyFile(otherPythonScriptPath.toString()).launch();

But in order to correctly handling python path addition of 3rd party
packages, which Marcelo has implemented in patch Spark 5479
<https://issues.apache.org/jira/browse/SPARK-5479>, download latest source
code of Spark, and built it yourself with maven.

Other pre-built Spark versions does not include that patch.



On Fri, Jul 10, 2015 at 9:52 AM, Sandy Ryza <sa...@cloudera.com> wrote:

> To add to this, conceptually, it makes no sense to launch something in
> yarn-cluster mode by creating a SparkContext on the client - the whole
> point of yarn-cluster mode is that the SparkContext runs on the cluster,
> not on the client.
>
> On Thu, Jul 9, 2015 at 2:35 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> You cannot run Spark in cluster mode by instantiating a SparkContext like
>> that.
>>
>> You have to launch it with the "spark-submit" command line script.
>>
>> On Thu, Jul 9, 2015 at 2:23 PM, jegordon <jg...@gmail.com> wrote:
>>
>>> Hi to all,
>>>
>>> Is there any way to run pyspark scripts with yarn-cluster mode without
>>> using
>>> the spark-submit script? I need it in this way because i will integrate
>>> this
>>> code into a django web app.
>>>
>>> When i try to run any script in yarn-cluster mode i got the following
>>> error
>>> :
>>>
>>> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't
>>> running on a cluster. Deployment to YARN is not supported directly by
>>> SparkContext. Please use spark-submit.
>>>
>>>
>>> I'm creating the sparkContext in the following way :
>>>
>>>         conf = (SparkConf()
>>>             .setMaster("yarn-cluster")
>>>             .setAppName("DataFrameTest"))
>>>
>>>         sc = SparkContext(conf = conf)
>>>
>>>         #Dataframe code ....
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Marcelo
>>
>
>


-- 

Best regards,
Elkhan Dadashov

Re: Pyspark not working on yarn-cluster mode

Posted by Sandy Ryza <sa...@cloudera.com>.
To add to this, conceptually, it makes no sense to launch something in
yarn-cluster mode by creating a SparkContext on the client - the whole
point of yarn-cluster mode is that the SparkContext runs on the cluster,
not on the client.

On Thu, Jul 9, 2015 at 2:35 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

> You cannot run Spark in cluster mode by instantiating a SparkContext like
> that.
>
> You have to launch it with the "spark-submit" command line script.
>
> On Thu, Jul 9, 2015 at 2:23 PM, jegordon <jg...@gmail.com> wrote:
>
>> Hi to all,
>>
>> Is there any way to run pyspark scripts with yarn-cluster mode without
>> using
>> the spark-submit script? I need it in this way because i will integrate
>> this
>> code into a django web app.
>>
>> When i try to run any script in yarn-cluster mode i got the following
>> error
>> :
>>
>> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't
>> running on a cluster. Deployment to YARN is not supported directly by
>> SparkContext. Please use spark-submit.
>>
>>
>> I'm creating the sparkContext in the following way :
>>
>>         conf = (SparkConf()
>>             .setMaster("yarn-cluster")
>>             .setAppName("DataFrameTest"))
>>
>>         sc = SparkContext(conf = conf)
>>
>>         #Dataframe code ....
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
> --
> Marcelo
>

Re: Pyspark not working on yarn-cluster mode

Posted by Marcelo Vanzin <va...@cloudera.com>.
You cannot run Spark in cluster mode by instantiating a SparkContext like
that.

You have to launch it with the "spark-submit" command line script.

On Thu, Jul 9, 2015 at 2:23 PM, jegordon <jg...@gmail.com> wrote:

> Hi to all,
>
> Is there any way to run pyspark scripts with yarn-cluster mode without
> using
> the spark-submit script? I need it in this way because i will integrate
> this
> code into a django web app.
>
> When i try to run any script in yarn-cluster mode i got the following error
> :
>
> org.apache.spark.SparkException: Detected yarn-cluster mode, but isn't
> running on a cluster. Deployment to YARN is not supported directly by
> SparkContext. Please use spark-submit.
>
>
> I'm creating the sparkContext in the following way :
>
>         conf = (SparkConf()
>             .setMaster("yarn-cluster")
>             .setAppName("DataFrameTest"))
>
>         sc = SparkContext(conf = conf)
>
>         #Dataframe code ....
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-not-working-on-yarn-cluster-mode-tp23755.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Marcelo