You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Russell Spitzer <ru...@gmail.com> on 2016/10/10 16:33:14 UTC

Official Stance on Not Using Spark Submit

I've seen a variety of users attempting to work around using Spark Submit
with at best middling levels of success. I think it would be helpful if the
project had a clear statement that submitting an application without using
Spark Submit is truly for experts only or is unsupported entirely.

I know this is a pretty strong stance and other people have had different
experiences than me so please let me know what you think :)

Re: Official Stance on Not Using Spark Submit

Posted by Ofir Manor <of...@equalum.io>.
Funny, someone from my team talked to me about that idea yesterday.
We use SparkLauncher, but it just calls spark-submit that calls other
scripts that starts a new Java program that tries to submit (in our case in
cluster mode - driver is started in the Spark cluster) and exit.
That make it a challenge to troubleshoot cases where submit fails,
especially when users tries our app on their own spark environment. He
hoped to get a more decent / specific exception if submit failed, or be
able to debug it in an IDE (the actual calling to the master, its response
etc).

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io

On Mon, Oct 10, 2016 at 9:13 PM, Russell Spitzer <ru...@gmail.com>
wrote:

> Just folks who don't want to use spark-submit, no real use-cases I've seen
> yet.
>
> I didn't know about SparkLauncher myself and I don't think there are any
> official docs on that or launching spark as an embedded library for tests.
>
> On Mon, Oct 10, 2016 at 11:09 AM Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> What are the main use cases you've seen for this? Maybe we can add a page
>> to the docs about how to launch Spark as an embedded library.
>>
>> Matei
>>
>> On Oct 10, 2016, at 10:21 AM, Russell Spitzer <ru...@gmail.com>
>> wrote:
>>
>> I actually had not seen SparkLauncher before, that looks pretty great :)
>>
>> On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer <
>> russell.spitzer@gmail.com> wrote:
>>
>>> I'm definitely only talking about non-embedded uses here as I also use
>>> embedded Spark (cassandra, and kafka) to run tests. This is almost always
>>> safe since everything is in the same JVM. It's only once we get to
>>> launching against a real distributed env do we end up with issues.
>>>
>>> Since Pyspark uses spark submit in the java gateway i'm not sure if that
>>> matters :)
>>>
>>> The cases I see are usually usually going through main directly, adding
>>> jars programatically.
>>>
>>> Usually ends up with classpath errors (Spark not on the CP, their jar
>>> not on the CP, dependencies not on the cp),
>>> conf errors (executors have the incorrect environment, executor
>>> classpath broken, not understanding spark-defaults won't do anything),
>>> Jar version mismatches
>>> Etc ...
>>>
>>> On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> I have also 'embedded' a Spark driver without much trouble. It isn't
>>>> that it can't work.
>>>>
>>>> The Launcher API is ptobably the recommended way to do that though.
>>>> spark-submit is the way to go for non programmatic access.
>>>>
>>>> If you're not doing one of those things and it is not working, yeah I
>>>> think people would tell you you're on your own. I think that's consistent
>>>> with all the JIRA discussions I have seen over time.
>>>>
>>>>
>>>> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> I've seen a variety of users attempting to work around using Spark
>>>>> Submit with at best middling levels of success. I think it would be helpful
>>>>> if the project had a clear statement that submitting an application without
>>>>> using Spark Submit is truly for experts only or is unsupported entirely.
>>>>>
>>>>> I know this is a pretty strong stance and other people have had
>>>>> different experiences than me so please let me know what you think :)
>>>>>
>>>>
>>

RE: Official Stance on Not Using Spark Submit

Posted by "assaf.mendelson" <as...@rsa.com>.
I actually not use spark submit for several use cases, all of them currently revolve around running it directly with python.
One of the most important ones is developing in pycharm.
Basically I have am using pycharm and configure it with a remote interpreter which runs on the server while my pycharm runs on my local windows machine.
In order for me to be able to effectively debug (stepping etc.), I want to define a run configuration in pycharm which would integrate fully with its debug tools. Unfortunately I couldn’t figure out a way to use spark-submit effectively. Instead I chose the following solution:
I defined the project to use the remorete interpreter running on the driver in the cluster.
I defined environment variables in the run configuration including setting PYTHONPATH to include pyspark and py4j manually, set up the relevant PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON, set up PYSPARK_SUBMIT_ARGS to include relevant configurations (e.g. relevant jars) and made sure it ended with pyspark-shell.

By providing this type of behavior I could debug spark remotely as if it was local.

Similar use cases include using standard tools that know how to run “python” script but are not aware of spark-submit.

I haven’t found similar reasons for scala/java code though (although I wish there was a similar “remote” setup for scala).
Assaf.


From: RussS [via Apache Spark Developers List] [mailto:ml-node+s1001551n19384h93@n3.nabble.com]
Sent: Monday, October 10, 2016 9:14 PM
To: Mendelson, Assaf
Subject: Re: Official Stance on Not Using Spark Submit

Just folks who don't want to use spark-submit, no real use-cases I've seen yet.

I didn't know about SparkLauncher myself and I don't think there are any official docs on that or launching spark as an embedded library for tests.

On Mon, Oct 10, 2016 at 11:09 AM Matei Zaharia <[hidden email]</user/SendEmail.jtp?type=node&node=19384&i=0>> wrote:
What are the main use cases you've seen for this? Maybe we can add a page to the docs about how to launch Spark as an embedded library.

Matei

On Oct 10, 2016, at 10:21 AM, Russell Spitzer <[hidden email]</user/SendEmail.jtp?type=node&node=19384&i=1>> wrote:

I actually had not seen SparkLauncher before, that looks pretty great :)

On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer <[hidden email]</user/SendEmail.jtp?type=node&node=19384&i=2>> wrote:
I'm definitely only talking about non-embedded uses here as I also use embedded Spark (cassandra, and kafka) to run tests. This is almost always safe since everything is in the same JVM. It's only once we get to launching against a real distributed env do we end up with issues.

Since Pyspark uses spark submit in the java gateway i'm not sure if that matters :)

The cases I see are usually usually going through main directly, adding jars programatically.

Usually ends up with classpath errors (Spark not on the CP, their jar not on the CP, dependencies not on the cp),
conf errors (executors have the incorrect environment, executor classpath broken, not understanding spark-defaults won't do anything),
Jar version mismatches
Etc ...

On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <[hidden email]</user/SendEmail.jtp?type=node&node=19384&i=3>> wrote:
I have also 'embedded' a Spark driver without much trouble. It isn't that it can't work.

The Launcher API is ptobably the recommended way to do that though. spark-submit is the way to go for non programmatic access.

If you're not doing one of those things and it is not working, yeah I think people would tell you you're on your own. I think that's consistent with all the JIRA discussions I have seen over time.

On Mon, Oct 10, 2016, 17:33 Russell Spitzer <[hidden email]</user/SendEmail.jtp?type=node&node=19384&i=4>> wrote:
I've seen a variety of users attempting to work around using Spark Submit with at best middling levels of success. I think it would be helpful if the project had a clear statement that submitting an application without using Spark Submit is truly for experts only or is unsupported entirely.

I know this is a pretty strong stance and other people have had different experiences than me so please let me know what you think :)


________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/Official-Stance-on-Not-Using-Spark-Submit-tp19376p19384.html
To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h20@n3.nabble.com<ma...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Official-Stance-on-Not-Using-Spark-Submit-tp19376p19430.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Official Stance on Not Using Spark Submit

Posted by Russell Spitzer <ru...@gmail.com>.
Just folks who don't want to use spark-submit, no real use-cases I've seen
yet.

I didn't know about SparkLauncher myself and I don't think there are any
official docs on that or launching spark as an embedded library for tests.

On Mon, Oct 10, 2016 at 11:09 AM Matei Zaharia <ma...@gmail.com>
wrote:

> What are the main use cases you've seen for this? Maybe we can add a page
> to the docs about how to launch Spark as an embedded library.
>
> Matei
>
> On Oct 10, 2016, at 10:21 AM, Russell Spitzer <ru...@gmail.com>
> wrote:
>
> I actually had not seen SparkLauncher before, that looks pretty great :)
>
> On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer <
> russell.spitzer@gmail.com> wrote:
>
> I'm definitely only talking about non-embedded uses here as I also use
> embedded Spark (cassandra, and kafka) to run tests. This is almost always
> safe since everything is in the same JVM. It's only once we get to
> launching against a real distributed env do we end up with issues.
>
> Since Pyspark uses spark submit in the java gateway i'm not sure if that
> matters :)
>
> The cases I see are usually usually going through main directly, adding
> jars programatically.
>
> Usually ends up with classpath errors (Spark not on the CP, their jar not
> on the CP, dependencies not on the cp),
> conf errors (executors have the incorrect environment, executor classpath
> broken, not understanding spark-defaults won't do anything),
> Jar version mismatches
> Etc ...
>
> On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <so...@cloudera.com> wrote:
>
> I have also 'embedded' a Spark driver without much trouble. It isn't that
> it can't work.
>
> The Launcher API is ptobably the recommended way to do that though.
> spark-submit is the way to go for non programmatic access.
>
> If you're not doing one of those things and it is not working, yeah I
> think people would tell you you're on your own. I think that's consistent
> with all the JIRA discussions I have seen over time.
>
>
> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <ru...@gmail.com>
> wrote:
>
> I've seen a variety of users attempting to work around using Spark Submit
> with at best middling levels of success. I think it would be helpful if the
> project had a clear statement that submitting an application without using
> Spark Submit is truly for experts only or is unsupported entirely.
>
> I know this is a pretty strong stance and other people have had different
> experiences than me so please let me know what you think :)
>
>
>

Re: Official Stance on Not Using Spark Submit

Posted by Matei Zaharia <ma...@gmail.com>.
What are the main use cases you've seen for this? Maybe we can add a page to the docs about how to launch Spark as an embedded library.

Matei

> On Oct 10, 2016, at 10:21 AM, Russell Spitzer <ru...@gmail.com> wrote:
> 
> I actually had not seen SparkLauncher before, that looks pretty great :)
> 
> On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer <russell.spitzer@gmail.com <ma...@gmail.com>> wrote:
> I'm definitely only talking about non-embedded uses here as I also use embedded Spark (cassandra, and kafka) to run tests. This is almost always safe since everything is in the same JVM. It's only once we get to launching against a real distributed env do we end up with issues.
> 
> Since Pyspark uses spark submit in the java gateway i'm not sure if that matters :)
> 
> The cases I see are usually usually going through main directly, adding jars programatically. 
> 
> Usually ends up with classpath errors (Spark not on the CP, their jar not on the CP, dependencies not on the cp), 
> conf errors (executors have the incorrect environment, executor classpath broken, not understanding spark-defaults won't do anything),
> Jar version mismatches
> Etc ...
> 
> On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <sowen@cloudera.com <ma...@cloudera.com>> wrote:
> I have also 'embedded' a Spark driver without much trouble. It isn't that it can't work. 
> 
> The Launcher API is ptobably the recommended way to do that though. spark-submit is the way to go for non programmatic access. 
> 
> If you're not doing one of those things and it is not working, yeah I think people would tell you you're on your own. I think that's consistent with all the JIRA discussions I have seen over time. 
> 
> 
> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <russell.spitzer@gmail.com <ma...@gmail.com>> wrote:
> I've seen a variety of users attempting to work around using Spark Submit with at best middling levels of success. I think it would be helpful if the project had a clear statement that submitting an application without using Spark Submit is truly for experts only or is unsupported entirely.
> 
> I know this is a pretty strong stance and other people have had different experiences than me so please let me know what you think :)


Re: Official Stance on Not Using Spark Submit

Posted by Russell Spitzer <ru...@gmail.com>.
I actually had not seen SparkLauncher before, that looks pretty great :)

On Mon, Oct 10, 2016 at 10:17 AM Russell Spitzer <ru...@gmail.com>
wrote:

> I'm definitely only talking about non-embedded uses here as I also use
> embedded Spark (cassandra, and kafka) to run tests. This is almost always
> safe since everything is in the same JVM. It's only once we get to
> launching against a real distributed env do we end up with issues.
>
> Since Pyspark uses spark submit in the java gateway i'm not sure if that
> matters :)
>
> The cases I see are usually usually going through main directly, adding
> jars programatically.
>
> Usually ends up with classpath errors (Spark not on the CP, their jar not
> on the CP, dependencies not on the cp),
> conf errors (executors have the incorrect environment, executor classpath
> broken, not understanding spark-defaults won't do anything),
> Jar version mismatches
> Etc ...
>
> On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <so...@cloudera.com> wrote:
>
> I have also 'embedded' a Spark driver without much trouble. It isn't that
> it can't work.
>
> The Launcher API is ptobably the recommended way to do that though.
> spark-submit is the way to go for non programmatic access.
>
> If you're not doing one of those things and it is not working, yeah I
> think people would tell you you're on your own. I think that's consistent
> with all the JIRA discussions I have seen over time.
>
>
> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <ru...@gmail.com>
> wrote:
>
> I've seen a variety of users attempting to work around using Spark Submit
> with at best middling levels of success. I think it would be helpful if the
> project had a clear statement that submitting an application without using
> Spark Submit is truly for experts only or is unsupported entirely.
>
> I know this is a pretty strong stance and other people have had different
> experiences than me so please let me know what you think :)
>
>

Re: Official Stance on Not Using Spark Submit

Posted by Russell Spitzer <ru...@gmail.com>.
I'm definitely only talking about non-embedded uses here as I also use
embedded Spark (cassandra, and kafka) to run tests. This is almost always
safe since everything is in the same JVM. It's only once we get to
launching against a real distributed env do we end up with issues.

Since Pyspark uses spark submit in the java gateway i'm not sure if that
matters :)

The cases I see are usually usually going through main directly, adding
jars programatically.

Usually ends up with classpath errors (Spark not on the CP, their jar not
on the CP, dependencies not on the cp),
conf errors (executors have the incorrect environment, executor classpath
broken, not understanding spark-defaults won't do anything),
Jar version mismatches
Etc ...

On Mon, Oct 10, 2016 at 10:05 AM Sean Owen <so...@cloudera.com> wrote:

> I have also 'embedded' a Spark driver without much trouble. It isn't that
> it can't work.
>
> The Launcher API is ptobably the recommended way to do that though.
> spark-submit is the way to go for non programmatic access.
>
> If you're not doing one of those things and it is not working, yeah I
> think people would tell you you're on your own. I think that's consistent
> with all the JIRA discussions I have seen over time.
>
>
> On Mon, Oct 10, 2016, 17:33 Russell Spitzer <ru...@gmail.com>
> wrote:
>
> I've seen a variety of users attempting to work around using Spark Submit
> with at best middling levels of success. I think it would be helpful if the
> project had a clear statement that submitting an application without using
> Spark Submit is truly for experts only or is unsupported entirely.
>
> I know this is a pretty strong stance and other people have had different
> experiences than me so please let me know what you think :)
>
>

Re: Official Stance on Not Using Spark Submit

Posted by Sean Owen <so...@cloudera.com>.
I have also 'embedded' a Spark driver without much trouble. It isn't that
it can't work.

The Launcher API is ptobably the recommended way to do that though.
spark-submit is the way to go for non programmatic access.

If you're not doing one of those things and it is not working, yeah I think
people would tell you you're on your own. I think that's consistent with
all the JIRA discussions I have seen over time.

On Mon, Oct 10, 2016, 17:33 Russell Spitzer <ru...@gmail.com>
wrote:

> I've seen a variety of users attempting to work around using Spark Submit
> with at best middling levels of success. I think it would be helpful if the
> project had a clear statement that submitting an application without using
> Spark Submit is truly for experts only or is unsupported entirely.
>
> I know this is a pretty strong stance and other people have had different
> experiences than me so please let me know what you think :)
>

Re: Official Stance on Not Using Spark Submit

Posted by Marcin Tustin <mt...@handybook.com>.
I've done this for some pyspark stuff. I didn't find it especially
problematic.

On Mon, Oct 10, 2016 at 12:58 PM, Reynold Xin <rx...@databricks.com> wrote:

> How are they using it? Calling some main function directly?
>
>
> On Monday, October 10, 2016, Russell Spitzer <ru...@gmail.com>
> wrote:
>
>> I've seen a variety of users attempting to work around using Spark Submit
>> with at best middling levels of success. I think it would be helpful if the
>> project had a clear statement that submitting an application without using
>> Spark Submit is truly for experts only or is unsupported entirely.
>>
>> I know this is a pretty strong stance and other people have had different
>> experiences than me so please let me know what you think :)
>>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity


Re: Official Stance on Not Using Spark Submit

Posted by Reynold Xin <rx...@databricks.com>.
How are they using it? Calling some main function directly?

On Monday, October 10, 2016, Russell Spitzer <ru...@gmail.com>
wrote:

> I've seen a variety of users attempting to work around using Spark Submit
> with at best middling levels of success. I think it would be helpful if the
> project had a clear statement that submitting an application without using
> Spark Submit is truly for experts only or is unsupported entirely.
>
> I know this is a pretty strong stance and other people have had different
> experiences than me so please let me know what you think :)
>