You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2014/09/04 01:45:25 UTC

[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/2257

    [SPARK-2778] [yarn] Add yarn integration tests.

    This patch adds a couple of, currently, very simple integration tests
    to make sure both client and cluster modes are working. The tests don't
    do much yet other than run a simple job, but the plan is to enhance
    them after we get the framework in.
    
    The cluster tests are noisy, so redirect all log output to a file
    like other tests do. Copying the conf around sucks but it's less
    work than messing with maven/sbt and having to clean up other
    projects.
    
    Note the test is only added for yarn-stable. The code compiles
    against yarn-alpha but there are two issues I ran into that I
    could not overcome:
    - and old netty dependency kept creeping into the classpath and
      causing akka to not work, when using sbt; the old netty was
      correctly suppressed under maven.
    - MiniYARNCluster kept failing to execute containers because it
      did not create the NM's local dir itself; this is apparently
      a known behavior, but I'm not sure how to work around it.
    
    None of those issues are present with the stable Yarn.
    
    Also, these tests are a little slow to run. Apparently Spark doesn't
    yet tag tests (so that these could be isolated in a "slow" batch),
    so this is something to keep in mind.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark yarn-tests

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2257
    
----
commit add84163ea0403d23dbbda540766d58329a27d76
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2014-06-09T23:15:24Z

    [SPARK-2778] [yarn] Add yarn integration tests.
    
    This patch adds a couple of, currently, very simple integration tests
    to make sure both client and cluster modes are working. The tests don't
    do much yet other than run a simple job, but the plan is to enhance
    them after we get the framework in.
    
    The cluster tests are noisy, so redirect all log output to a file
    like other tests do. Copying the conf around sucks but it's less
    work than messing with maven/sbt and having to clean up other
    projects.
    
    Note the test is only added for yarn-stable. The code compiles
    against yarn-alpha but there are two issues I ran into that I
    could not overcome:
    - and old netty dependency kept creeping into the classpath and
      causing akka to not work, when using sbt; the old netty was
      correctly suppressed under maven.
    - MiniYARNCluster kept failing to execute containers because it
      did not create the NM's local dir itself; this is apparently
      a known behavior, but I'm not sure how to work around it.
    
    None of those issues are present with the stable Yarn.
    
    Also, these tests are a little slow to run. Apparently Spark doesn't
    yet tag tests (so that these could be isolated in a "slow" batch),
    so this is something to keep in mind.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56720812
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20758/consoleFull) for   PR 2257 at commit [`6d5b84e`](https://github.com/apache/spark/commit/6d5b84e8b5987683591d8c07b3ff8557d9581871).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56604168
  
    Having everything in the same file is ok. I think the trick in this case is to convince the child processes launched by Yarn to not use the common log4j configuration, and instead log to stderr (so that Yarn will take care of the logs). Let me look at doing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54876535
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54884759
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19993/consoleFull) for   PR 2257 at commit [`add8416`](https://github.com/apache/spark/commit/add84163ea0403d23dbbda540766d58329a27d76).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56720822
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20758/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56609722
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20733/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56413709
  
    Ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55464465
  
    It's great to see us adding tests here. @vanzin how long do these tests take, roughly? We might have to only run these in certain situations if they take a long time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56603473
  
    I think you have to SSH into the machines to get them, but even if you have them they're kinda jumbled because we log everything to the same file. Actually here we might want to do something like what we did in #2108 so we know what's wrong when the test fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56600035
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20731/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938367
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private var oldConf: Map[String, String] = _
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    oldConf = sys.props.toMap
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    +
    +    sys.props ++= oldConf
    --- End diff --
    
    I don't see why we need to do this anymore, since we don't remove anything from `sys.props`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17937472
  
    --- Diff: yarn/pom.xml ---
    @@ -126,7 +126,6 @@
             <configuration>
               <environmentVariables>
                 <SPARK_HOME>${basedir}/../..</SPARK_HOME>
    -            <SPARK_CLASSPATH>${spark.classpath}</SPARK_CLASSPATH>
    --- End diff --
    
    why this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17500277
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.HashMap
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private val oldConf = new HashMap[String, String]()
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    --- End diff --
    
    Is this just making a copy of `sys.props`? Why not just do `toMap`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55465007
  
    @pwendell  see the comments above:
    
    They take about 1min to run on my machine. Most of the time is setting up and tearing down the MiniYARNCluster instance, which is only done once, so I hope adding more tests won't add so much on top of that.
    
    scalatest has tags that can be used to choose which tests to run, but Spark doesn't use them anywhere. I can look at adding them to these (and disabling them in the default run) if people think that's too long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56710658
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20758/consoleFull) for   PR 2257 at commit [`6d5b84e`](https://github.com/apache/spark/commit/6d5b84e8b5987683591d8c07b3ff8557d9581871).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56605345
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20733/consoleFull) for   PR 2257 at commit [`5c2b56f`](https://github.com/apache/spark/commit/5c2b56fc1947f973525abd0f9aa7cd77267c8fb3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55895625
  
    @pwendell  @andrewor14   Any concerns with the time on this?  I know our unit tests already take a long time to run but this doesn't seem to bad.  I think going forward before we add a bunch more tests we might need to figure out how to run them separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938804
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private var oldConf: Map[String, String] = _
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    oldConf = sys.props.toMap
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    +
    +    sys.props ++= oldConf
    +
    +    super.afterAll()
    +  }
    +
    +  test("run Spark in yarn-client mode") {
    +    var result = File.createTempFile("result", null, tempDir)
    +    YarnClusterDriver.main(Array("yarn-client", result.getAbsolutePath()))
    +    checkResult(result)
    +  }
    +
    +  test("run Spark in yarn-cluster mode") {
    +    val main = YarnClusterDriver.getClass.getName().stripSuffix("$")
    +    var result = File.createTempFile("result", null, tempDir)
    +
    +    // The Client object will call System.exit() after the job is done, and we don't want
    +    // that because it messes up the scalatest monitoring. So replicate some of what main()
    +    // does here.
    +    val args = Array("--class", main,
    +      "--jar", "file:" + fakeSparkJar.getAbsolutePath(),
    +      "--arg", "yarn-cluster",
    +      "--arg", result.getAbsolutePath(),
    +      "--num-executors", "1")
    +    val sparkConf = new SparkConf()
    +    val yarnConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
    +    val clientArgs = new ClientArguments(args, sparkConf)
    +    new Client(clientArgs, yarnConf, sparkConf).run()
    +    checkResult(result)
    +  }
    +
    +  /**
    +   * This is a workaround for an issue with yarn-cluster mode: the Client class will not provide
    +   * any sort of error when the job process finishes successfully, but the job itself fails. So
    +   * the tests enforce that something is written to a file after everything is ok to indicate
    +   * that the job succeeded.
    +   */
    +  private def checkResult(result: File) = {
    +    var resultString = Files.toString(result, Charsets.UTF_8)
    +    resultString should be ("success")
    +  }
    +
    +}
    +
    +private object YarnClusterDriver extends Logging with Matchers {
    +
    +  def main(args: Array[String]) = {
    +    val sc = new SparkContext(new SparkConf().setMaster(args(0))
    +      .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and $dollarSigns"))
    +    val status = new File(args(1))
    +    var result = "failure"
    +    try {
    +      val data = sc.parallelize(1 to 4).map(i => i).collect().toSet
    --- End diff --
    
    It was just a way to trigger an actual job. I guess I could do `parallelize(1 to 4, 4).collect()` to achieve the same thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56711888
  
    Ah good catch. The latest changes LGTM if you get the tests to pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54385497
  
    Suggestions about making this work in yarn-alpha are welcome; I was hoping this would allow me to test yarn-alpha changes, but I failed. :-/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56593060
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20729/consoleFull) for   PR 2257 at commit [`67f5b02`](https://github.com/apache/spark/commit/67f5b02d6a3866685194e6fede43bafde35375cb).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17557475
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.HashMap
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private val oldConf = new HashMap[String, String]()
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    --- End diff --
    
    I guess I was trying to be paranoid and clean up any "spark.*" options from sys.props before running these tests. That may not be the best idea (since I'd be cleaning up "spark.test" properties set by the build scripts), so I went with the copy instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938630
  
    --- Diff: yarn/pom.xml ---
    @@ -126,7 +126,6 @@
             <configuration>
               <environmentVariables>
                 <SPARK_HOME>${basedir}/../..</SPARK_HOME>
    -            <SPARK_CLASSPATH>${spark.classpath}</SPARK_CLASSPATH>
    --- End diff --
    
    Oh, and one more: I can't see `spark.classpath` being set anywhere.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55413766
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20228/consoleFull) for   PR 2257 at commit [`68fbbbf`](https://github.com/apache/spark/commit/68fbbbfc03d9d18eec58c1a6dad058014157e9da).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56601784
  
    Jenkins, retest this please.
    
    Is there any way to access `unit-tests.log` from jenkins?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938584
  
    --- Diff: yarn/pom.xml ---
    @@ -126,7 +126,6 @@
             <configuration>
               <environmentVariables>
                 <SPARK_HOME>${basedir}/../..</SPARK_HOME>
    -            <SPARK_CLASSPATH>${spark.classpath}</SPARK_CLASSPATH>
    --- End diff --
    
    Prior to this test it was not used. SPARK_CLASSPATH is also deprecated.
    
    The new test uses `spark.{driver|executor}.extraClassPath` instead, to make sure they work correctly, and those two settings also conflict with SPARK_CLASSPATH.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56609714
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20733/consoleFull) for   PR 2257 at commit [`5c2b56f`](https://github.com/apache/spark/commit/5c2b56fc1947f973525abd0f9aa7cd77267c8fb3).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54913778
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20004/consoleFull) for   PR 2257 at commit [`68fbbbf`](https://github.com/apache/spark/commit/68fbbbfc03d9d18eec58c1a6dad058014157e9da).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938081
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private var oldConf: Map[String, String] = _
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    oldConf = sys.props.toMap
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    --- End diff --
    
    This whole chunk can be rewritten as:
    ```
    sys.props.retain { case (k, _) => !k.startsWith("spark.") }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56777632
  
    Thanks Marcelo and @andrewor14  for review - I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55405175
  
    @vanzin this mostly looks good. I've been trying to run it locally but I have been having lots of issues with the unit tests lately.  Hopefully have this done and check in later today.
    
    Thanks for working on this, we definitely need more tests on the yarn side.  We may need to be careful about adding more in the future if they take to longer, perhaps we can look at supporting the different types as you mention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54907066
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56709844
  
    I found the problem - it was caused by a recent PR that basically broke yarn-cluster mode...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54700294
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19879/consoleFull) for   PR 2257 at commit [`add8416`](https://github.com/apache/spark/commit/add84163ea0403d23dbbda540766d58329a27d76).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56599628
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20729/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17500397
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.HashMap
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private val oldConf = new HashMap[String, String]()
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        oldConf += (k -> v)
    +        sys.props -= k
    +      }
    +    }
    +
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    +
    +    oldConf.foreach { case (k, v) => sys.props += (k -> v) }
    +
    +    super.afterAll()
    +  }
    +
    +  test("run Spark in yarn-client mode") {
    +    var result = File.createTempFile("result", null, tempDir)
    +    YarnClusterDriver.main(Array("yarn-client", result.getAbsolutePath()))
    +    checkResult(result)
    +  }
    +
    +  test("run Spark in yarn-cluster mode") {
    +    val main = YarnClusterDriver.getClass.getName().stripSuffix("$")
    +    var result = File.createTempFile("result", null, tempDir)
    +
    +    // The Client object will call System.exit() after the job is done, and we don't want
    +    // that because it messes up the scalatest monitoring. So replicate some of what main()
    +    // does here.
    +    val args = Array("--class", main,
    +      "--jar", "file:" + fakeSparkJar.getAbsolutePath(),
    +      "--arg", "yarn-cluster",
    +      "--arg", result.getAbsolutePath(),
    +      "--num-executors", "4")
    +    val sparkConf = new SparkConf()
    +    val yarnConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
    +    val clientArgs = new ClientArguments(args, sparkConf)
    +    new Client(clientArgs, yarnConf, sparkConf).run()
    +    checkResult(result)
    +  }
    +
    +  /**
    +   * This is a workaround for an issue with yarn-cluster mode: the Client class will not provide
    +   * any sort of error when the job process finishes successfully, but the job itself fails. So
    +   * the tests enforce that something is written to a file after everything is ok to indicate
    +   * that the job succeeded.
    +   */
    +  private def checkResult(result: File) = {
    +    var resultString = Files.toString(result, Charsets.UTF_8)
    +    resultString should be ("success")
    +  }
    +
    +}
    +
    +private object YarnClusterDriver extends Logging with Matchers {
    +
    +  def main(args: Array[String]) = {
    +    val sc = new SparkContext(new SparkConf().setMaster(args(0))
    +      .setAppName("yarn \"test app\" 'with quotes'"))
    --- End diff --
    
    Nice. Can you also add backslashes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55405196
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56746157
  
    Yay, and the test already found a bug before even being checked in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55629955
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20347/consoleFull) for   PR 2257 at commit [`f01517c`](https://github.com/apache/spark/commit/f01517c2916e1041086722916f50747bbb596134).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55405736
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20228/consoleFull) for   PR 2257 at commit [`68fbbbf`](https://github.com/apache/spark/commit/68fbbbfc03d9d18eec58c1a6dad058014157e9da).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938293
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private var oldConf: Map[String, String] = _
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    oldConf = sys.props.toMap
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    +
    +    sys.props ++= oldConf
    +
    +    super.afterAll()
    +  }
    +
    +  test("run Spark in yarn-client mode") {
    +    var result = File.createTempFile("result", null, tempDir)
    +    YarnClusterDriver.main(Array("yarn-client", result.getAbsolutePath()))
    +    checkResult(result)
    +  }
    +
    +  test("run Spark in yarn-cluster mode") {
    +    val main = YarnClusterDriver.getClass.getName().stripSuffix("$")
    +    var result = File.createTempFile("result", null, tempDir)
    +
    +    // The Client object will call System.exit() after the job is done, and we don't want
    +    // that because it messes up the scalatest monitoring. So replicate some of what main()
    +    // does here.
    +    val args = Array("--class", main,
    +      "--jar", "file:" + fakeSparkJar.getAbsolutePath(),
    +      "--arg", "yarn-cluster",
    +      "--arg", result.getAbsolutePath(),
    +      "--num-executors", "1")
    +    val sparkConf = new SparkConf()
    +    val yarnConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
    +    val clientArgs = new ClientArguments(args, sparkConf)
    +    new Client(clientArgs, yarnConf, sparkConf).run()
    +    checkResult(result)
    +  }
    +
    +  /**
    +   * This is a workaround for an issue with yarn-cluster mode: the Client class will not provide
    +   * any sort of error when the job process finishes successfully, but the job itself fails. So
    +   * the tests enforce that something is written to a file after everything is ok to indicate
    +   * that the job succeeded.
    +   */
    +  private def checkResult(result: File) = {
    +    var resultString = Files.toString(result, Charsets.UTF_8)
    +    resultString should be ("success")
    +  }
    +
    +}
    +
    +private object YarnClusterDriver extends Logging with Matchers {
    +
    +  def main(args: Array[String]) = {
    --- End diff --
    
    Can you add an arg check here that prints usage if not enough args are provided?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55349832
  
    Thanks @vanzin, this is really cool. If it takes a long time to run this test, we can always enable it only if yarn files are modified. Then we don't have to worry about further inflating the test time if we want to test more features.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55641526
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20347/consoleFull) for   PR 2257 at commit [`f01517c`](https://github.com/apache/spark/commit/f01517c2916e1041086722916f50747bbb596134).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54698546
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54891736
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19993/consoleFull) for   PR 2257 at commit [`add8416`](https://github.com/apache/spark/commit/add84163ea0403d23dbbda540766d58329a27d76).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-55629883
  
    I reduced the number of executors from the cluster test (from 4 to 1, something I meant to do before but forgot) and shaved ~15s on my machine. Tests still run sort of slow, but it's a little better now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17500757
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.HashMap
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private val oldConf = new HashMap[String, String]()
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    --- End diff --
    
    Actually, why not just take a snapshot of `sys.props` here and restore it later? Might be simpler


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17987981
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -401,17 +401,17 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
               // it has an uncaught exception thrown out.  It needs a shutdown hook to set SUCCEEDED.
               status = FinalApplicationStatus.SUCCEEDED
             } catch {
    -          case e: InvocationTargetException => {
    +          case e: InvocationTargetException =>
                 e.getCause match {
    -              case _: InterruptedException => {
    +              case _: InterruptedException =>
                     // Reporter thread can interrupt to stop user class
    -              }
    +
    +              case e => throw e
    --- End diff --
    
    I don't think you need this right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54975874
  
    I think its ok to not have them for yarn-alpha.  
    
    how slow are they?  You couldn't find a way to have them run separately then and not with normal unit tests? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17500631
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.HashMap
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private val oldConf = new HashMap[String, String]()
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        oldConf += (k -> v)
    +        sys.props -= k
    +      }
    +    }
    +
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    +
    +    oldConf.foreach { case (k, v) => sys.props += (k -> v) }
    --- End diff --
    
    This can just be `oldConf ++= sys.props.toMap` I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56694970
  
    I'll merge with master and see if I can reproduce the failure...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2257


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56587975
  
    Hey @vanzin I left a few minor comments, but this LGTM overall. I haven't verified the dependency logic, however.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56696661
  
    Yep, fails locally too after the merge. Let me look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17938207
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,133 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private var oldConf: Map[String, String] = _
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    oldConf = sys.props.toMap
    +    yarnCluster.getConfig().foreach { e =>
    +      sys.props += ("spark.hadoop." + e.getKey() -> e.getValue())
    +    }
    +
    +    fakeSparkJar = File.createTempFile("sparkJar", null, tempDir)
    +    sys.props += ("spark.yarn.jar" -> ("local:" + fakeSparkJar.getAbsolutePath()))
    +    sys.props += ("spark.executor.instances" -> "1")
    +    sys.props += ("spark.driver.extraClassPath" -> sys.props("java.class.path"))
    +    sys.props += ("spark.executor.extraClassPath" -> sys.props("java.class.path"))
    +
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll() {
    +    yarnCluster.stop()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    +      if (k.startsWith("spark.")) {
    +        sys.props -= k
    +      }
    +    }
    +
    +    sys.props ++= oldConf
    +
    +    super.afterAll()
    +  }
    +
    +  test("run Spark in yarn-client mode") {
    +    var result = File.createTempFile("result", null, tempDir)
    +    YarnClusterDriver.main(Array("yarn-client", result.getAbsolutePath()))
    +    checkResult(result)
    +  }
    +
    +  test("run Spark in yarn-cluster mode") {
    +    val main = YarnClusterDriver.getClass.getName().stripSuffix("$")
    +    var result = File.createTempFile("result", null, tempDir)
    +
    +    // The Client object will call System.exit() after the job is done, and we don't want
    +    // that because it messes up the scalatest monitoring. So replicate some of what main()
    +    // does here.
    +    val args = Array("--class", main,
    +      "--jar", "file:" + fakeSparkJar.getAbsolutePath(),
    +      "--arg", "yarn-cluster",
    +      "--arg", result.getAbsolutePath(),
    +      "--num-executors", "1")
    +    val sparkConf = new SparkConf()
    +    val yarnConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
    +    val clientArgs = new ClientArguments(args, sparkConf)
    +    new Client(clientArgs, yarnConf, sparkConf).run()
    +    checkResult(result)
    +  }
    +
    +  /**
    +   * This is a workaround for an issue with yarn-cluster mode: the Client class will not provide
    +   * any sort of error when the job process finishes successfully, but the job itself fails. So
    +   * the tests enforce that something is written to a file after everything is ok to indicate
    +   * that the job succeeded.
    +   */
    +  private def checkResult(result: File) = {
    +    var resultString = Files.toString(result, Charsets.UTF_8)
    +    resultString should be ("success")
    +  }
    +
    +}
    +
    +private object YarnClusterDriver extends Logging with Matchers {
    +
    +  def main(args: Array[String]) = {
    +    val sc = new SparkContext(new SparkConf().setMaster(args(0))
    +      .setAppName("yarn \"test app\" 'with quotes' and \\back\\slashes and $dollarSigns"))
    +    val status = new File(args(1))
    +    var result = "failure"
    +    try {
    +      val data = sc.parallelize(1 to 4).map(i => i).collect().toSet
    --- End diff --
    
    Does the map to identity here do anything?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2257#discussion_r17500340
  
    --- Diff: yarn/stable/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.deploy.yarn
    +
    +import java.io.File
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.HashMap
    +
    +import com.google.common.base.Charsets
    +import com.google.common.io.Files
    +import org.scalatest.{BeforeAndAfterAll, FunSuite, Matchers}
    +
    +import org.apache.hadoop.yarn.conf.YarnConfiguration
    +import org.apache.hadoop.yarn.server.MiniYARNCluster
    +
    +import org.apache.spark.{Logging, SparkConf, SparkContext}
    +import org.apache.spark.deploy.SparkHadoopUtil
    +import org.apache.spark.util.Utils
    +
    +class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers {
    +
    +  private val oldConf = new HashMap[String, String]()
    +  private var yarnCluster: MiniYARNCluster = _
    +  private var tempDir: File = _
    +  private var fakeSparkJar: File = _
    +
    +  override def beforeAll() {
    +    tempDir = Utils.createTempDir()
    +
    +    yarnCluster = new MiniYARNCluster(getClass().getName(), 1, 1, 1)
    +    yarnCluster.init(new YarnConfiguration())
    +    yarnCluster.start()
    +
    +    val sysProps = sys.props.map { case (k, v) => (k, v) }
    +    sysProps.foreach { case (k, v) =>
    --- End diff --
    
    Actually why remove the `spark.*` properties from `sys.props`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54994349
  
    They take about 1min to run on my machine. Most of the time is setting up and tearing down the MiniYARNCluster instance, which is only done once, so I hope adding more tests won't add so much on top of that.
    
    scalatest has tags that can be used to choose which tests to run, but Spark doesn't use them anywhere. I can look at adding them to these (and disabling them in the default run) if people think that's too long.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54698934
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19879/consoleFull) for   PR 2257 at commit [`add8416`](https://github.com/apache/spark/commit/add84163ea0403d23dbbda540766d58329a27d76).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-56599622
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20729/consoleFull) for   PR 2257 at commit [`67f5b02`](https://github.com/apache/spark/commit/67f5b02d6a3866685194e6fede43bafde35375cb).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54909180
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20004/consoleFull) for   PR 2257 at commit [`68fbbbf`](https://github.com/apache/spark/commit/68fbbbfc03d9d18eec58c1a6dad058014157e9da).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/2257#issuecomment-54648499
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org