You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by deshanxiao <gi...@git.apache.org> on 2018/08/13 14:54:16 UTC

[GitHub] spark pull request #22093: [SPARK-25100][CORE] Fix no registering TaskCommit...

GitHub user deshanxiao opened a pull request:

    https://github.com/apache/spark/pull/22093

    [SPARK-25100][CORE] Fix no registering TaskCommitMessage bug

    ## What changes were proposed in this pull request?
    
    Fix the bug when invoking saveAsNewAPIHadoopDataset to store data, the job will fail because the class TaskCommitMessage hasn't be registered if serializer is KryoSerializer and spark.kryo.registrationRequired is true
    
    ## How was this patch tested?
    
    Run all the test in project

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/deshanxiao/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22093
    
----
commit ab56a2e657d6820f99cb959bb9afc19578a855f3
Author: deshanxiao <de...@...>
Date:   2018-08-13T14:21:36Z

    Fix no registering TaskCommitMessage bug

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by deshanxiao <gi...@git.apache.org>.
Github user deshanxiao commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    @xuanyuanking  I used place the test in it, but KryoSerializerSuite exteends SharedSparkContext where I could not create my own SparkContext.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22093: [SPARK-25100][CORE] Fix no registering TaskCommit...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22093#discussion_r209650955
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -424,6 +425,39 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         randomRDD.saveAsNewAPIHadoopDataset(jobConfig)
         assert(new File(tempDir.getPath + "/outputDataset_new/part-r-00000").exists() === true)
       }
    +  
    +  test("SPARK-25100: Using KryoSerializer and" +
    +      " setting registrationRequired true can lead job failed") {
    +    val tempDir = Utils.createTempDir()
    +    val inputDir = tempDir.getAbsolutePath + "/input"
    +    val outputDir = tempDir.getAbsolutePath + "/tmp"
    +    
    +    val writer = new PrintWriter(new File(inputDir))
    +    
    +    for(i <- 1 to 100) {
    +      writer.print(i)
    +      writer.write('\n')
    +    }
    +    
    +    writer.close()
    +    
    +    val conf = new SparkConf(false).setMaster("local").
    +        set("spark.kryo.registrationRequired", "true").setAppName("test")
    +    conf.set("spark.serializer", classOf[KryoSerializer].getName)
    +    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    --- End diff --
    
    Why we need set 'spark.serializer' twice?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by deshanxiao <gi...@git.apache.org>.
Github user deshanxiao commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    @xuanyuanking I think it carefully later. You are right. The UT just need to guarantee the KryoSerializer right not all. I will add it in `KryoSerializerSuite`. Should I delete current UT from FileSuit?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    Why we should create own SparkContext here? Could we just add a UT like `registration of HighlyCompressedMapStatus` to check `TaskCommitMessage` working?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    `Should I delete current UT from FileSuit?`
    I think current UT in `FileSuite` is unnecessarily, you can leave it and wait for other reviewer's opinion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by deshanxiao <gi...@git.apache.org>.
Github user deshanxiao commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    @xuanyuanking Thanks for your suggestion. We could indeed test whether `TaskCommitMessage` can be serialized by `KryoSerializer`. But We could not explain why the framework must serialize `TaskCommitMessage`. So we create our own `SparkContext`, It could be more convinced to show that it won't work because of the framework not registering `TaskCommitMessage`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22093: [SPARK-25100][CORE] Fix no registering TaskCommitMessage...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22093
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22093: [SPARK-25100][CORE] Fix no registering TaskCommit...

Posted by deshanxiao <gi...@git.apache.org>.
Github user deshanxiao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22093#discussion_r209652992
  
    --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala ---
    @@ -424,6 +425,39 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {
         randomRDD.saveAsNewAPIHadoopDataset(jobConfig)
         assert(new File(tempDir.getPath + "/outputDataset_new/part-r-00000").exists() === true)
       }
    +  
    +  test("SPARK-25100: Using KryoSerializer and" +
    +      " setting registrationRequired true can lead job failed") {
    +    val tempDir = Utils.createTempDir()
    +    val inputDir = tempDir.getAbsolutePath + "/input"
    +    val outputDir = tempDir.getAbsolutePath + "/tmp"
    +    
    +    val writer = new PrintWriter(new File(inputDir))
    +    
    +    for(i <- 1 to 100) {
    +      writer.print(i)
    +      writer.write('\n')
    +    }
    +    
    +    writer.close()
    +    
    +    val conf = new SparkConf(false).setMaster("local").
    +        set("spark.kryo.registrationRequired", "true").setAppName("test")
    +    conf.set("spark.serializer", classOf[KryoSerializer].getName)
    +    conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
    --- End diff --
    
    yes, you are right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org