You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2015/02/07 16:00:42 UTC

[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/4452

    SPARK-4267 [YARN] Failing to launch jobs on Spark on YARN with Hadoop 2.5.0 or later

    Before passing to YARN, escape arguments in "extraJavaOptions" args, in order to correctly handle cases like -Dfoo="one two three". Also standardize how these args are handled and ensure that individual args are treated as stand-alone args, not one string.
    
    @vanzin @andrewor14

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-4267.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4452
    
----
commit c8297d2b2029b3020c169d079fd09a198f4ef43b
Author: Sean Owen <so...@cloudera.com>
Date:   2015-02-07T14:59:47Z

    Before passing to YARN, escape arguments in "extraJavaOptions" args, in order to correctly handle cases like -Dfoo="one two three". Also standardize how these args are handled and ensure that individual args are treated as stand-alone args, not one string.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-74137303
  
    @andrewor14 In light of https://github.com/apache/spark/commit/466b1f671b21f575d28f9c103f51765790914fe3 which is of similar severity and impact, I'm wondering if it does in fact make sense to back port to 1.2? I'm still getting the feel of what the rules of thumb are for backporting to minor release n-1, n-2, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73370483
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27003/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73562788
  
    Alright, I'll merge this into 1.3 and master. We can decide whether to backport it later, but I believe the only difference in behavior is the very case that we are trying to fix, i.e. the cases that would've failed anyway without this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-74140364
  
    In general there aren't really any strict rules for how far a patch should be back ported. One thing though is that it is much less likely for us to do more releases on older branches like 0.9, and people don't generally expect it either but instead may opt to upgrade to newer minor releases, which is desirable. So I'd say we usually just determine this on a case by case basis.
    
    For this particular patch it is definitely relevant for 1.2. For older branches, however, I believe a stable version of Hadoop 2.5 may not even have been released yet, so I wonder how much of a relevance there is. There will likely be nontrivial merge conflicts for branch-1.2 since we removed alpha support in 1.3 and refactored the YARN code quite a bit. Would you mind opening this PR against the 1.2 branch nevertheless?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73386933
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73367642
  
      [Test build #27003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27003/consoleFull) for   PR 4452 at commit [`c8297d2`](https://github.com/apache/spark/commit/c8297d2b2029b3020c169d079fd09a198f4ef43b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4452#discussion_r24295994
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -435,10 +435,11 @@ private[spark] class Client(
     
         // Include driver-specific java options if we are launching a driver
         if (isClusterMode) {
    -      sparkConf.getOption("spark.driver.extraJavaOptions")
    +      val driverOpts = sparkConf.getOption("spark.driver.extraJavaOptions")
             .orElse(sys.env.get("SPARK_JAVA_OPTS"))
    -        .map(Utils.splitCommandString).getOrElse(Seq.empty)
    -        .foreach(opts => javaOpts += opts)
    +      driverOpts.foreach { opts =>
    +        javaOpts ++= Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell)
    +      }
    --- End diff --
    
    ok that's fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73556742
  
    LGTM. I assume that without the fix the tests would fail because the java command line would be wrong? (I'd add an explicit check for the system property in one of the apps, just in case.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4452#discussion_r24293657
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -435,10 +435,11 @@ private[spark] class Client(
     
         // Include driver-specific java options if we are launching a driver
         if (isClusterMode) {
    -      sparkConf.getOption("spark.driver.extraJavaOptions")
    +      val driverOpts = sparkConf.getOption("spark.driver.extraJavaOptions")
             .orElse(sys.env.get("SPARK_JAVA_OPTS"))
    -        .map(Utils.splitCommandString).getOrElse(Seq.empty)
    -        .foreach(opts => javaOpts += opts)
    +      driverOpts.foreach { opts =>
    +        javaOpts ++= Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell)
    +      }
    --- End diff --
    
    any reason you need to save this in a val? You can probably do away with it without sacrificing readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73533792
  
    @vanzin Any more thoughts on this? I feel pretty good about it, as it does seem a lot right-er, and we have a test case that failed before and worked after. Also, I got bit by this directly today.
    
    @pwendell is this something I would back port to 1.2? further?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4452#discussion_r24293685
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -435,10 +435,11 @@ private[spark] class Client(
     
         // Include driver-specific java options if we are launching a driver
         if (isClusterMode) {
    -      sparkConf.getOption("spark.driver.extraJavaOptions")
    +      val driverOpts = sparkConf.getOption("spark.driver.extraJavaOptions")
             .orElse(sys.env.get("SPARK_JAVA_OPTS"))
    -        .map(Utils.splitCommandString).getOrElse(Seq.empty)
    -        .foreach(opts => javaOpts += opts)
    +      driverOpts.foreach { opts =>
    +        javaOpts ++= Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell)
    +      }
    --- End diff --
    
    Just for readability, I thought... otherwise the continuation indent and indent of the foreach block are the same and it looked weird. And for consistency with the `amOpts` below I suppose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4267 [YARN] Failing to launch jobs on Sp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4452#issuecomment-73370476
  
      [Test build #27003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27003/consoleFull) for   PR 4452 at commit [`c8297d2`](https://github.com/apache/spark/commit/c8297d2b2029b3020c169d079fd09a198f4ef43b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org