You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2014/04/24 23:35:25 UTC

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/539

    Correctly set the Yarn app name when launching the AM.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark yarn-app-name

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #539
    
----
commit c1b29f0aea01a7b065e901e959bdd0c8c0298f9a
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2014-04-24T21:31:44Z

    Correctly set the Yarn app name when launching the AM.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41417156
  
    There were a bunch of things that kind of broke since the changes went in with pr299, especially around env variables with yarn.  I thought @pwendell was going to file a jira to fix them all up at once.
    
    Note we are keeping the env variables for backwards compatibility but we should be moving towards using the spark-submit script with config options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41418596
  
    I just inverted the condition in the latest patch, so that SPARK_YARN_APP_NAME overrides SparkConf if set. Tested both cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41703171
  
    This case you describe wasn't originally supported on yarn.  Perhaps it should have been fixed up when sparkConf was added but it looks like it wasn't so I would consider this more of an improvement rather then a bug.. minor nit.  This patch only makes it work for the yarn-client mode, I would like to see it work in both if we are changing it.  Would you be able to look into yarn-cluster mode also? 
    
    The thing that I noticed while looking at this that I do consider a bug (but not what you reported in this pr) is that if I specify --name with spark-submit script with yarn-client mode it doesn't properly set the name on the RM or in spark. the yarn-cluster mode does work fine though. I will file a separate jira for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41335784
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41351097
  
    Personal opinion: the less cluster-specific options we have the better, so unless there's a very compelling reason to keep SPARK_YARN_APP_NAME supported, I'd just remove it and use the configured app name always. It seems to me, from the code, that it was just a workaround for the real bug anyway (the app name not being read from the config, but from the JVM system properties).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-42630957
  
    I've merged this, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41349740
  
    Yeah, it looks like even if we set `SPARK_YARN_APP_NAME` it will never get propagated to the YARN client. I agree that switching around the two conditions will correct this, but how will the other configs be affected?
    
    More generally, what semantics should we provide here? Should env vars always take precedence over the corresponding value in `spark.*` configuration options?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-42512812
  
    I dug through the code a bunch. For those following this PR, here's a quick summary. Looks like the way we set Spark app name on YARN is actually quite convoluted because there are two types of app names:
    
    1. The one observed by the RM
    2. The one observed by SparkContext
    
    This PR is mainly concerned with (1) and only on the YARN client mode. Before the changes, there are two ways of affecting (1), in the order of decreasing priority.
    
    (a) set `spark.app.name` in spark-defaults
    (b) set `SPARK_YARN_APP_NAME` in spark-env
    
    After the changes, this becomes
    
    (a) set `SPARK_YARN_APP_NAME` in spark-env
    (b) set `spark.app.name` in SparkConf (through `conf.setAppName`)
    (c) set `spark.app.name` in spark-defaults
    
    This guarantees that if an environment variable is explicitly specified, then the RM should use that name. Otherwise, use whatever SparkContext is using for its app name, in which case (1) == (2). To me, the semantics are pretty clear.
    
    Notice however that in either before or after, there is now way for SparkSubmit's --name parameter to affect either (1) or (2). This is a separate bug with SparkSubmit that I have summarized in (SPARK-1755)[https://issues.apache.org/jira/browse/SPARK-1755].


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41416489
  
    Well, the current patch does not work correctly because with it SPARK_YARN_APP_NAME becomes useless.
    
    But before the patch my app name (set using the conf, not SPARK_YARN_APP_NAME) was never set in yarn, it was always "Spark". To set my app name in Yarn that way I'd have to use "-Dspark.app.name" in the JVM options, and not SparkConf.setAppName().
    
    So the discussion is whether to keep SPARK_YARN_APP_NAME or get rid of it; since both are user-provided, I don't see a reason for keeping both, unless there's some other benefit that I'm not seeing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/539


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41421041
  
    what are you using to launch your job?  can you please file a jira for this issue to link to this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41413122
  
    Pinging @tgravescs, since he seems to play around with Yarn a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41415868
  
    Have you verified that it doesn't work correctly?  My reading of the code is that setting SPARK_YARN_APP_NAME causes the spark.app.name system property to be set, which gets picked up in the SparkConf. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41356862
  
    I agree, we have 10 million env variables and spark config options, many of which are conflicting. It would be nice to cut down on the ones we don't actually use. (Though by that scale of exaggeration hadoop probably has order 1 billion configs)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41698742
  
    Hi Tom.
    
    SPARK_YARN_APP_NAME works fine. What doesn't work is when you do not set that env variable; in that case, the app name in the RM will always be "Spark", instead of the app name set in your app's SparkConf instance. Unless you explicitly set the app name using a JVM system property ("-Dspark.app.name=foo").
    
    So:
    - set SPARK_YARN_APP_NAME: works fine
    - pass "-Dspark.app.name=foo" in command line: works fine
    - do "new SparkConf().setAppName("foo")": does not work (RM app name is "Spark")



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41699194
  
    Sorry somehow my browser wasn't showing a jira number and I missed the explanation there, thanks for adding an explanation here.   It makes sense now.   
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-42513184
  
    @vanzin Your proposal about having SparkSubmit calling `System.setProperty(spark.app.name)` can be made clean if we just always convert `--name` to `spark.app.name`, which is what the SparkSubmit usage currently suggests but does not fulfill. I will change this in a separate PR to address SPARK-1755.
    
    As for this PR, the changes LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41864970
  
    Ah, I see what you mean. I don't see a clean way to make "--name" work in client mode; SparkSubmit could call System.setProperty("spark.app.name"), but that (i) looks hacky and (ii) might be overriden by the application when it sets the app name in SparkConf.
    
    Maybe stash that value somewhere else and check it in the code my patch is touching? It's less hacky in that it would allow overriding the SparkConf value, but still kinda ugly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41698198
  
    I just tried setting SPARK_YARN_APP_NAME and its working fine for me without this patch with the yarn-client mode.   I was just running the examples.   Note that the intent of this env is to set the app name in the ResourceManager, not in spark itself.  it was not originally equivalent to spark.app.name.  I do agree that we should make them match up though.  
    
    can you please file a jira for tracking this and give more details on how you are reproducing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Correctly set the Yarn app name when launching...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41340061
  
    Question: if I understand the code correctly, SPARK_YARN_APP_NAME will never be used (since I don't think it's possible to instantiate a SparkContext without spark.app.name being set in the corresponding SparkConf).
    
    If that's true, is there any point in keeping SPARK_YARN_APP_NAME around? If there is, I'd have to switch the conditions around (so that it overrides the value in SparkConf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1631] Correctly set the Yarn app name w...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/539#issuecomment-41703467
  
    I thought I looked at yarn-cluster mode and it was doing the right thing, but I'll look again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---