You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/10/21 01:00:24 UTC

[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/22785

    [SPARK-25791][SQL] Datatype of serializers in RowEncoder should be accessible

    ## What changes were proposed in this pull request?
    
    The serializers of `RowEncoder` use few `If` Catalyst expression which inherits `ComplexTypeMergingExpression` that will check input data types.
    
    It is possible to generate serializers which fail the check and can't to access the data type of serializers. When producing If expression, we should use the same data type at its input expressions.
    
    ## How was this patch tested?
    
    Added test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-25791

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22785
    
----
commit d6ed4b7cec98b34ff609df3fdfcd009c1f01c50a
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-21T00:57:18Z

    Datatype of serializers should be accessible.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97676/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r227310290
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala ---
    @@ -273,6 +273,16 @@ class RowEncoderSuite extends CodegenInterpretedPlanTest {
         assert(e4.getMessage.contains("java.lang.String is not a valid external type"))
       }
     
    +  test("SPARK-25791: Datatype of serializers should be accessible") {
    +    val udtSQLType = new StructType().add("a", IntegerType)
    +    val pythonUDT = new PythonUserDefinedType(udtSQLType, "pyUDT", "serializedPyClass")
    --- End diff --
    
    For normal UDT, we create an `Invoke` with the original udt type. So it won't cause such problem.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22785


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r226978811
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala ---
    @@ -273,6 +273,16 @@ class RowEncoderSuite extends CodegenInterpretedPlanTest {
         assert(e4.getMessage.contains("java.lang.String is not a valid external type"))
       }
     
    +  test("SPARK-25791: Datatype of serializers should be accessible") {
    +    val udtSQLType = new StructType().add("a", IntegerType)
    +    val pythonUDT = new PythonUserDefinedType(udtSQLType, "pyUDT", "serializedPyClass")
    --- End diff --
    
    can we reproduce the bug with normal UDT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97674/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r226979620
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala ---
    @@ -273,6 +273,16 @@ class RowEncoderSuite extends CodegenInterpretedPlanTest {
         assert(e4.getMessage.contains("java.lang.String is not a valid external type"))
       }
     
    +  test("SPARK-25791: Datatype of serializers should be accessible") {
    +    val udtSQLType = new StructType().add("a", IntegerType)
    +    val pythonUDT = new PythonUserDefinedType(udtSQLType, "pyUDT", "serializedPyClass")
    +    val schema = new StructType().add("pythonUDT", pythonUDT, true)
    +    val encoder = RowEncoder(schema)
    +    // scalastyle:off println
    +    encoder.serializer.foreach(s => println(s.dataType))
    --- End diff --
    
    Let me try it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r226977895
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -171,7 +171,7 @@ object RowEncoder {
     
           if (inputObject.nullable) {
             If(IsNull(inputObject),
    -          Literal.create(null, inputType),
    +          Literal.create(null, nonNullOutput.dataType),
    --- End diff --
    
    what's the difference?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    **[Test build #97676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97676/testReport)** for PR 22785 at commit [`6d41fe0`](https://github.com/apache/spark/commit/6d41fe00bfb62dea91632072a3e7abcb143c9183).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r226977568
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala ---
    @@ -273,6 +273,16 @@ class RowEncoderSuite extends CodegenInterpretedPlanTest {
         assert(e4.getMessage.contains("java.lang.String is not a valid external type"))
       }
     
    +  test("SPARK-25791: Datatype of serializers should be accessible") {
    +    val udtSQLType = new StructType().add("a", IntegerType)
    +    val pythonUDT = new PythonUserDefinedType(udtSQLType, "pyUDT", "serializedPyClass")
    +    val schema = new StructType().add("pythonUDT", pythonUDT, true)
    +    val encoder = RowEncoder(schema)
    +    // scalastyle:off println
    +    encoder.serializer.foreach(s => println(s.dataType))
    --- End diff --
    
    we shouldn't print in a test, can we use assert?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4177/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r227314368
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -187,7 +187,7 @@ object RowEncoder {
             val convertedField = if (field.nullable) {
               If(
                 Invoke(inputObject, "isNullAt", BooleanType, Literal(index) :: Nil),
    -            Literal.create(null, field.dataType),
    +            Literal.create(null, fieldValue.dataType),
    --- End diff --
    
    Added a comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4178/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r226978705
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -187,7 +187,7 @@ object RowEncoder {
             val convertedField = if (field.nullable) {
               If(
                 Invoke(inputObject, "isNullAt", BooleanType, Literal(index) :: Nil),
    -            Literal.create(null, field.dataType),
    +            Literal.create(null, fieldValue.dataType),
    --- End diff --
    
    can we add a comment to explain it? `field.dataType` can be different from `fieldValue.dataType`, because we strip UDT


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    **[Test build #97676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97676/testReport)** for PR 22785 at commit [`6d41fe0`](https://github.com/apache/spark/commit/6d41fe00bfb62dea91632072a3e7abcb143c9183).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    **[Test build #97674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97674/testReport)** for PR 22785 at commit [`d6ed4b7`](https://github.com/apache/spark/commit/d6ed4b7cec98b34ff609df3fdfcd009c1f01c50a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    **[Test build #97915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97915/testReport)** for PR 22785 at commit [`61c8b2f`](https://github.com/apache/spark/commit/61c8b2f3488232d6ec6d134e21c4c5887d489065).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    **[Test build #97674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97674/testReport)** for PR 22785 at commit [`d6ed4b7`](https://github.com/apache/spark/commit/d6ed4b7cec98b34ff609df3fdfcd009c1f01c50a).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4400/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    cc @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97915/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22785: [SPARK-25791][SQL] Datatype of serializers in RowEncoder...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22785
  
    **[Test build #97915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97915/testReport)** for PR 22785 at commit [`61c8b2f`](https://github.com/apache/spark/commit/61c8b2f3488232d6ec6d134e21c4c5887d489065).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22785: [SPARK-25791][SQL] Datatype of serializers in Row...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22785#discussion_r226978368
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -171,7 +171,7 @@ object RowEncoder {
     
           if (inputObject.nullable) {
             If(IsNull(inputObject),
    -          Literal.create(null, inputType),
    +          Literal.create(null, nonNullOutput.dataType),
    --- End diff --
    
    I see, for this case it has no difference, but makes the code clearer


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org