You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by choochootrain <gi...@git.apache.org> on 2015/10/30 02:08:40 UTC

[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

GitHub user choochootrain opened a pull request:

    https://github.com/apache/spark/pull/9367

    [SPARK-11195][CORE] Use correct classloader for TaskResultGetter

    Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/choochootrain/spark spark-11195

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9367
    
----
commit 90c47aa133de83dd49dea00b0949088e8600ab23
Author: Hurshal Patel <hp...@gmail.com>
Date:   2015-10-30T01:05:44Z

    [SPARK-11195][CORE] Use correct classloader for TaskResultGetter
    
    Make sure we are using the context classloader when deserializing failed
    TaskResults instead of the Spark classloader.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152398230
  
    `SparkSubmitSuite` is helpful, but I want to catch and assert the type of exception that is thrown when the job fails - calling into `doMain` in `SparkSubmit.scala` would be closer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-155353765
  
    @yhuai @choochootrain how is this one going? it does seem desirable to test this as much as possible. If we're having to introduce very complex mechanisms to do so and they aren't working, is there anything simpler even if less effective we can do to test it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152396959
  
    Will `SparkSubmitSuite` help?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44880250
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala ---
    @@ -17,8 +17,13 @@
     
     package org.apache.spark.scheduler
     
    +import java.io.File
    +import java.net.URL
     import java.nio.ByteBuffer
     
    +import org.apache.spark.TestUtils.JavaSourceFromString
    +import org.apache.spark.util.{MutableURLClassLoader, Utils}
    --- End diff --
    
    Let's order imports according to https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43840784
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    I probably missed something. Why do we need to put spark's assembly jar at here? When we run test, spark's classes are already in class path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152386718
  
      [Test build #44655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44655/consoleFull) for   PR 9367 at commit [`90c47aa`](https://github.com/apache/spark/commit/90c47aa133de83dd49dea00b0949088e8600ab23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44880369
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala ---
    @@ -119,5 +124,47 @@ class TaskResultGetterSuite extends SparkFunSuite with BeforeAndAfter with Local
         // Make sure two tasks were run (one failed one, and a second retried one).
         assert(scheduler.nextTaskId.get() === 2)
       }
    +
    +  // SPARK-11195
    +  // Make sure we are using the context classloader when deserializing failed TaskResults instead of
    +  // the Spark classloader.
    +  test("failed task deserialized with the correct classloader") {
    +    // compile a small jar containing an exception that will be thrown on an executor.
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    val excSource = new JavaSourceFromString(new File(srcDir, "MyException").getAbsolutePath,
    +      """package repro;
    +        |
    +        |public class MyException extends Exception {
    +        |}
    +      """.stripMargin)
    +    val excFile = TestUtils.createCompiledClass("MyException", srcDir, excSource, Seq.empty)
    +    val jarFile = new File(tempDir, "testJar-%s.jar".format(System.currentTimeMillis()))
    +    TestUtils.createJar(Seq(excFile), jarFile, directoryPrefix = Some("repro"))
    +
    +    // load the exception from the jar
    +    val loader = new MutableURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader)
    +    loader.addURL(jarFile.toURI.toURL)
    +    Thread.currentThread().setContextClassLoader(loader)
    +    val excClass: Class[_] = Class.forName("repro.MyException", false, loader)
    +
    +    sc = new SparkContext("local", "test", conf)
    --- End diff --
    
    I think we need to add a comment to say we need to use `local` in order to make this test work. Otherwise, that jar will not be loaded at the executor side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153543840
  
      [Test build #44987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44987/console) for   PR 9367 at commit [`c63ca09`](https://github.com/apache/spark/commit/c63ca091714c464f22e9e28b8605aa8c551eaa87).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152379661
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152822504
  
    @choochootrain are you able to put together a small test like what @yhuai mentions? then I think this is good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153572051
  
      [Test build #44991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44991/console) for   PR 9367 at commit [`53f7c4c`](https://github.com/apache/spark/commit/53f7c4cda8216ce6db726ae88fbf89776da62b27).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153221384
  
    @yhuai i can't write the test directly in `SparkSubmitSuite` because this is a classloader issue that only repros (as far as I can tell) when an external jar is loaded. I'm adding a test that uses `TestUtils.createCompiledClass` to compile and submit my external jar. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43844511
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    so the test `classes are correctly loaded when tasks fail` is failing because of the above issue: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44991/consoleFull
    
    i'm more than happy to take another approach if i'm going about this the wrong way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-155879290
  
    I'd like to go with 3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43840818
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    btw, when you unit test `SparkSubmitSuite`, you have to do `assembly/assembly` before you can run any of these tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153543287
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44880215
  
    --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
    @@ -78,15 +79,15 @@ private[spark] object TestUtils {
       }
     
       /**
    -   * Create a jar file that contains this set of files. All files will be located at the root
    -   * of the jar.
    +   * Create a jar file that contains this set of files. All files will be located in the specified
    +   * directory or at the root of the jar.
        */
    -  def createJar(files: Seq[File], jarFile: File): URL = {
    +  def createJar(files: Seq[File], jarFile: File, directoryPrefix: Option[String] = None): URL = {
    --- End diff --
    
    Do we need to add `directoryPrefix`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152415572
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153244789
  
    @yhuai yep this should also be in master. should I also submit a pr on master when this one looks good or will a maintainer be able to cherry-pick the commit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44880055
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala ---
    @@ -119,5 +124,47 @@ class TaskResultGetterSuite extends SparkFunSuite with BeforeAndAfter with Local
         // Make sure two tasks were run (one failed one, and a second retried one).
         assert(scheduler.nextTaskId.get() === 2)
       }
    +
    +  // SPARK-11195
    +  // Make sure we are using the context classloader when deserializing failed TaskResults instead of
    +  // the Spark classloader.
    +  test("failed task deserialized with the correct classloader") {
    +    // compile a small jar containing an exception that will be thrown on an executor.
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    val excSource = new JavaSourceFromString(new File(srcDir, "MyException").getAbsolutePath,
    +      """package repro;
    +        |
    +        |public class MyException extends Exception {
    +        |}
    +      """.stripMargin)
    +    val excFile = TestUtils.createCompiledClass("MyException", srcDir, excSource, Seq.empty)
    +    val jarFile = new File(tempDir, "testJar-%s.jar".format(System.currentTimeMillis()))
    +    TestUtils.createJar(Seq(excFile), jarFile, directoryPrefix = Some("repro"))
    +
    +    // load the exception from the jar
    +    val loader = new MutableURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader)
    +    loader.addURL(jarFile.toURI.toURL)
    +    Thread.currentThread().setContextClassLoader(loader)
    --- End diff --
    
    Can we set the original loader back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152380893
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153550807
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43844348
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    in order to test this issue, i need to submit an external jar to `spark-submit`. i can compile my job it using `TestUtils`, but i need to put (some version) of spark on the classpath so that `javac` can resolve `RDD` and so on. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152379692
  
    @brkyvz might know more about testing Spark Submit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153543842
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44447058
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    We should go with the simplest option that reproduces the issue. In other `SparkSubmitSuite` tests we used (2) but only out of necessity, where we just prepackage a jar and put it in the test resources dir. This makes it a little hard to maintain, e.g. it doesn't work with scala-2.11.
    
    In this case, maybe (3) is the simplest and most maintainable. It's unlikely that we'll ever have to modify `MyException`, but the reproduction code itself should be kept flexible. Could you give it a try?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152402475
  
    How about this. In the main method, you can catch the exception and if it is the expected type, let the main method finish successfully. Otherwise, throw an exception, which causes a non-zero exit code. In `runSparkSubmit`, we will find that the exit code is not 0 and fail the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152372707
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-156604552
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-156603526
  
    should be less invasive now :)
    note that i still have to make some minor changes to `TestUtils` in order for the jar to be in the correct format.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-156604550
  
      [Test build #45913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45913/console) for   PR 9367 at commit [`71f8df9`](https://github.com/apache/spark/commit/71f8df97e6cf3c477101c7fce8c195ae8e0c829f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153551172
  
      [Test build #44991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44991/consoleFull) for   PR 9367 at commit [`53f7c4c`](https://github.com/apache/spark/commit/53f7c4cda8216ce6db726ae88fbf89776da62b27).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153191848
  
    i'm writing the test right now, is there an easy way to get the relative path to the assembled spark jar so I can compile my job against it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43922931
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    @JoshRosen @andrewor14 any idea on how to test this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153543543
  
      [Test build #44987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44987/consoleFull) for   PR 9367 at commit [`c63ca09`](https://github.com/apache/spark/commit/c63ca091714c464f22e9e28b8605aa8c551eaa87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153248208
  
    @choochootrain It will be great if you can submit a pr against the master. Our merge script only cherry-pick commit from master to a branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153543299
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43927722
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    Or maybe I didn't understand the issue that this patch is trying to solve very well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152380950
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153192578
  
    @choochootrain I think you can write your app in SparkSubmitSuite (https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala#L590-L614 is an example).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153543844
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44987/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153550780
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43840823
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    This is what our jenkins does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153572088
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-155502581
  
    I feel the easiest way is to have something like https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/regression-test-SPARK-8489. So, we will not need to change anything in TestUtils. We just have a jar containing your class and the main object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-156604441
  
      [Test build #45913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45913/consoleFull) for   PR 9367 at commit [`71f8df9`](https://github.com/apache/spark/commit/71f8df97e6cf3c477101c7fce8c195ae8e0c829f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152372967
  
    I have a manual test that exhibits this behavior in https://issues.apache.org/jira/browse/SPARK-11195 and I am working on adding a test to the repo.
    
    In order to test this I basically want to mirror the classloader hierarchy created by `spark-submit` - are there any conventions or existing tests which do something like this that I can look at?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153550645
  
    whoops, fixed the style error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43836051
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    the only remaining issue is that this jar is hardcoded for the maven profiles that i am using. i was experimenting with putting the entire maven `target/classes` directory on the classpath but that seems equally janky. any suggestions here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152530948
  
    This looks good as I'm not otherwise clear why these two blocks would use different classloaders.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153237759
  
    @choochootrain I just noticed that this PR is for branch-1.5. Should we also fix it in master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43936635
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    Maybe you can compile a jar that just include your jar and put your app below. In your app, you use reflection to create an instance of your exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44880126
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala ---
    @@ -119,5 +124,47 @@ class TaskResultGetterSuite extends SparkFunSuite with BeforeAndAfter with Local
         // Make sure two tasks were run (one failed one, and a second retried one).
         assert(scheduler.nextTaskId.get() === 2)
       }
    +
    +  // SPARK-11195
    +  // Make sure we are using the context classloader when deserializing failed TaskResults instead of
    +  // the Spark classloader.
    +  test("failed task deserialized with the correct classloader") {
    --- End diff --
    
    Can we add comments to outline how this fix is tested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-153572089
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44991/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-156604554
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45913/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43927641
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    Why do you need to compile using Spark? Can't you just create your own exception, compile it, pass that to Spark Submit. Then move all of the code here down to something like `SimpleApplicationTest`, where you create your exception through reflection and then throw it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-155865241
  
    @srowen @andrewor14 @yhuai
    
    it seems like the SPARK-8489 approach would be less invasive, but also less maintainable. any preferences? i'd like to stick with one and get this out asap.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152415573
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44655/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r44880391
  
    --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
    @@ -130,7 +131,7 @@ private[spark] object TestUtils {
         val fileName = className + ".class"
         val result = new File(fileName)
         assert(result.exists(), "Compiled file not found: " + result.getAbsolutePath())
    -    val out = new File(destDir, fileName)
    +    val out = new File(destDir, result.getName)
    --- End diff --
    
    why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by choochootrain <gi...@git.apache.org>.
Github user choochootrain commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9367#discussion_r43942556
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
    @@ -366,6 +367,72 @@ class SparkSubmitSuite
         }
       }
     
    +  // SPARK-11195
    +  test("classes are correctly loaded when tasks fail") {
    +    // Compile a simple jar that throws a user defined exception on the driver
    +    val tempDir = Utils.createTempDir()
    +    val srcDir = new File(tempDir, "repro/")
    +    srcDir.mkdirs()
    +    // scalastyle:off line.size.limit
    +    val mainSource = new JavaSourceFromString(new File(srcDir, "MyJob").getAbsolutePath,
    +      """package repro;
    +        |
    +        |import java.util.*;
    +        |import java.util.regex.*;
    +        |import org.apache.spark.*;
    +        |import org.apache.spark.api.java.*;
    +        |import org.apache.spark.api.java.function.*;
    +        |
    +        |public class MyJob {
    +        |  public static class MyException extends Exception {
    +        |  }
    +        |
    +        |  public static void main(String[] args) {
    +        |    SparkConf conf = new SparkConf();
    +        |    JavaSparkContext sc = new JavaSparkContext(conf);
    +        |
    +        |    JavaRDD rdd = sc.parallelize(Arrays.asList(new Integer[]{1}), 1).map(new Function<Integer, Boolean>() {
    +        |      public Boolean call(Integer x) throws MyException {
    +        |        throw new MyException();
    +        |      }
    +        |    });
    +        |
    +        |    try {
    +        |      rdd.collect();
    +        |
    +        |      assert(false); // should be unreachable
    +        |    } catch (Exception e) {
    +        |      // the driver should not have any problems resolving the exception class and determining
    +        |      // why the task failed.
    +        |
    +        |      Pattern unknownFailure = Pattern.compile(".*Lost task.*: UnknownReason.*", Pattern.DOTALL);
    +        |      Pattern expectedFailure = Pattern.compile(".*Lost task.*: repro.MyJob\\$MyException.*", Pattern.DOTALL);
    +        |
    +        |      assert(!unknownFailure.matcher(e.getMessage()).matches());
    +        |      assert(expectedFailure.matcher(e.getMessage()).matches());
    +        |    }
    +        |  }
    +        |}
    +      """.stripMargin)
    +    // scalastyle:on line.size.limit
    +    val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
    --- End diff --
    
    all this patch does is make `TaskResult` deserialization use `Utils.getContextOrSparkClassLoader` (the classloader which loaded the `spark-submit`ed jar) instead of `Utils.getSparkClassLoader` (this is `AppClassLoader` which only has spark classes in it). without this patch, a failed task would not be able to deserialize an exception if it did not exist in `Utils.getSparkClassLoader`.
    
    in order to reproduce this issue, i set up a situation where `Utils.getContextOnSparkClassLoader` contains `MyException` but `Utils.getSparkClassLoader` does not (see https://issues.apache.org/jira/browse/SPARK-11195). this is easy to manually test with `spark-submit` and a user defined exception, but turning this into an automated test is proving to be much trickier. here are the 3 options:
    
    * :x: if i place all of the code into `SparkSubmitSuite`, the bug won't be hit because `MyException` will be in the root classloader and my patch makes no difference.
    
    * :grey_question: if i place all of the code into an external jar and run `spark-submit`, i can set up the same situation as my repro which found this bug. the issue i am running into is that i need a spark classpath in order to compile my jar. i can use the assembled jar, but this changes depending on the maven profiles that are enabled and so on.
    
    * :grey_question: i can try @brkyvz & @yhuai's hybrid approach of putting only the exception into a jar and the rest of the code into `SparkSubmitSuite`. i will have to do the following in order to repro this issue:
      * load the jar with `MyException` in a new classloader whose parent is the root classloader
      * somehow allow this classloader to be used by the driver and the executor *without* changing `Utils.getSparkClassLoader`.
    
     at this point am i not reimplementing `spark-submit`? :)
    
    the final approach is certainly worth trying, i'll take a look at it later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-155869908
  
    My vote is with (3). I feel it requires the least amount of new code that you have to write and is more maintainable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152372506
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9367#issuecomment-152415520
  
      [Test build #44655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44655/console) for   PR 9367 at commit [`90c47aa`](https://github.com/apache/spark/commit/90c47aa133de83dd49dea00b0949088e8600ab23).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org