You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2018/10/16 15:30:12 UTC

[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/22749

    [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to get rid of flat flag

    ## What changes were proposed in this pull request?
    
    This is inspired during implementing #21732. For now `ScalaReflection` needs to consider how `ExpressionEncoder` uses generated serializers and deserializers. And `ExpressionEncoder` has a weird `flat` flag. After discussion with @cloud-fan, it seems to be better to refactor `ExpressionEncoder`. It should make SPARK-24762 easier to do.
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-24762-refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22749.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22749
    
----
commit e1b5deebe715479125c8878f0c90a55dc9ab3e85
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-07-09T03:42:04Z

    Aggregator should be able to use Option of Product encoder.

commit 80506f4e98184ccd66dbaac14ec52d69c358020d
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-07-13T04:40:55Z

    Enable top-level Option of Product encoders.

commit ed3d5cb697b10af2e2cf4c78ab521d4d0b2f3c9b
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T04:26:28Z

    Remove topLevel parameter.

commit 9fc3f6165156051142a8366a32726badaaa16bb7
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T04:37:39Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762

commit 5f95bd0cf1bd308c7df55c41caef7a9f19368f5d
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T04:42:33Z

    Remove useless change.

commit a4f04055b2ba22f371663565710328791942855a
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T14:38:16Z

    Add more tests.

commit c1f798f7e9cba0d04223eed06f1b1f547ec29dc5
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-25T01:52:01Z

    Add test.

commit 80e11d289d7775863cb9c28b2c1d4364292048a4
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-06T04:06:57Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762

commit 0f029b0a28700334dc6334f1ad89b3124f235a51
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-06T04:40:07Z

    Improve code comments.

commit d755e8406f06117ccc96b8f19debab6b2a736e10
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-15T09:55:03Z

    Refactoring ExpressionEncoder.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226844557
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -198,7 +196,7 @@ object RowEncoder {
     
           if (inputObject.nullable) {
             If(IsNull(inputObject),
    -          Literal.create(null, inputType),
    +          Literal.create(null, nonNullOutput.dataType),
    --- End diff --
    
    Created a separate PR at #22785.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227681844
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1087,7 +1087,7 @@ class Dataset[T] private[sql](
         // Note that we do this before joining them, to enable the join operator to return null for one
         // side, in cases like outer-join.
         val left = {
    -      val combined = if (this.exprEnc.flat) {
    +      val combined = if (!this.exprEnc.objSerializer.dataType.isInstanceOf[StructType]) {
    --- End diff --
    
    shall we create a method in `ExpressionEncoder` for this check?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97971/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97645/testReport)** for PR 22749 at commit [`5b9abb6`](https://github.com/apache/spark/commit/5b9abb67907dfdb0c0c64751db3525564f832422).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4219/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97964/testReport)** for PR 22749 at commit [`ed4f4c9`](https://github.com/apache/spark/commit/ed4f4c90ec4c162f56373950089eec4632787817).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227745900
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -351,11 +347,15 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("SPARK-23835: add null check to non-nullable types in Tuples") {
         def numberOfCheckedArguments(deserializer: Expression): Int = {
    -      assert(deserializer.isInstanceOf[NewInstance])
    -      deserializer.asInstanceOf[NewInstance].arguments.count(_.isInstanceOf[AssertNotNull])
    +      val newInstance = deserializer.collect { case n: NewInstance => n}.head
    +      newInstance.arguments.count(_.isInstanceOf[AssertNotNull])
         }
    -    assert(numberOfCheckedArguments(deserializerFor[(Double, Double)]) == 2)
    -    assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, Int)]) == 1)
    -    assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, java.lang.Integer)]) == 0)
    +    assert(numberOfCheckedArguments(
    +      deserializerForType(ScalaReflection.localTypeOf[(Double, Double)])) == 2)
    --- End diff --
    
    Sounds good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227996228
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -280,59 +281,59 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("serialize and deserialize arbitrary sequence types") {
         import scala.collection.immutable.Queue
    -    val queueSerializer = serializerFor[Queue[Int]](BoundReference(
    -      0, ObjectType(classOf[Queue[Int]]), nullable = false))
    -    assert(queueSerializer.dataType.head.dataType ==
    +    val queueSerializer = serializerForType(ScalaReflection.localTypeOf[Queue[Int]])
    +    assert(queueSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val queueDeserializer = deserializerFor[Queue[Int]]
         assert(queueDeserializer.dataType == ObjectType(classOf[Queue[_]]))
     
         import scala.collection.mutable.ArrayBuffer
    -    val arrayBufferSerializer = serializerFor[ArrayBuffer[Int]](BoundReference(
    -      0, ObjectType(classOf[ArrayBuffer[Int]]), nullable = false))
    -    assert(arrayBufferSerializer.dataType.head.dataType ==
    +    val arrayBufferSerializer = serializerForType(ScalaReflection.localTypeOf[ArrayBuffer[Int]])
    +    assert(arrayBufferSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val arrayBufferDeserializer = deserializerFor[ArrayBuffer[Int]]
         assert(arrayBufferDeserializer.dataType == ObjectType(classOf[ArrayBuffer[_]]))
       }
     
       test("serialize and deserialize arbitrary map types") {
    -    val mapSerializer = serializerFor[Map[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[Map[Int, Int]]), nullable = false))
    -    assert(mapSerializer.dataType.head.dataType ==
    +    val mapSerializer = serializerForType(ScalaReflection.localTypeOf[Map[Int, Int]])
    +    assert(mapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val mapDeserializer = deserializerFor[Map[Int, Int]]
         assert(mapDeserializer.dataType == ObjectType(classOf[Map[_, _]]))
     
         import scala.collection.immutable.HashMap
    -    val hashMapSerializer = serializerFor[HashMap[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[HashMap[Int, Int]]), nullable = false))
    -    assert(hashMapSerializer.dataType.head.dataType ==
    +    val hashMapSerializer = serializerForType(ScalaReflection.localTypeOf[HashMap[Int, Int]])
    +    assert(hashMapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val hashMapDeserializer = deserializerFor[HashMap[Int, Int]]
         assert(hashMapDeserializer.dataType == ObjectType(classOf[HashMap[_, _]]))
     
         import scala.collection.mutable.{LinkedHashMap => LHMap}
    -    val linkedHashMapSerializer = serializerFor[LHMap[Long, String]](BoundReference(
    -      0, ObjectType(classOf[LHMap[Long, String]]), nullable = false))
    -    assert(linkedHashMapSerializer.dataType.head.dataType ==
    +    val linkedHashMapSerializer = serializerForType(
    +        ScalaReflection.localTypeOf[LHMap[Long, String]])
    +    assert(linkedHashMapSerializer.dataType ==
           MapType(LongType, StringType, valueContainsNull = true))
         val linkedHashMapDeserializer = deserializerFor[LHMap[Long, String]]
         assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]]))
       }
     
       test("SPARK-22442: Generate correct field names for special characters") {
    -    val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
    -      0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
    +    val serializer = serializerForType(ScalaReflection.localTypeOf[SpecialCharAsFieldData])
    --- End diff --
    
    Ok. I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97885/testReport)** for PR 22749 at commit [`400f878`](https://github.com/apache/spark/commit/400f87817183640006140e2db1839f8d78a13856).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227796616
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -280,59 +281,59 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("serialize and deserialize arbitrary sequence types") {
         import scala.collection.immutable.Queue
    -    val queueSerializer = serializerFor[Queue[Int]](BoundReference(
    -      0, ObjectType(classOf[Queue[Int]]), nullable = false))
    -    assert(queueSerializer.dataType.head.dataType ==
    +    val queueSerializer = serializerForType(ScalaReflection.localTypeOf[Queue[Int]])
    +    assert(queueSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val queueDeserializer = deserializerFor[Queue[Int]]
         assert(queueDeserializer.dataType == ObjectType(classOf[Queue[_]]))
     
         import scala.collection.mutable.ArrayBuffer
    -    val arrayBufferSerializer = serializerFor[ArrayBuffer[Int]](BoundReference(
    -      0, ObjectType(classOf[ArrayBuffer[Int]]), nullable = false))
    -    assert(arrayBufferSerializer.dataType.head.dataType ==
    +    val arrayBufferSerializer = serializerForType(ScalaReflection.localTypeOf[ArrayBuffer[Int]])
    +    assert(arrayBufferSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val arrayBufferDeserializer = deserializerFor[ArrayBuffer[Int]]
         assert(arrayBufferDeserializer.dataType == ObjectType(classOf[ArrayBuffer[_]]))
       }
     
       test("serialize and deserialize arbitrary map types") {
    -    val mapSerializer = serializerFor[Map[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[Map[Int, Int]]), nullable = false))
    -    assert(mapSerializer.dataType.head.dataType ==
    +    val mapSerializer = serializerForType(ScalaReflection.localTypeOf[Map[Int, Int]])
    +    assert(mapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val mapDeserializer = deserializerFor[Map[Int, Int]]
         assert(mapDeserializer.dataType == ObjectType(classOf[Map[_, _]]))
     
         import scala.collection.immutable.HashMap
    -    val hashMapSerializer = serializerFor[HashMap[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[HashMap[Int, Int]]), nullable = false))
    -    assert(hashMapSerializer.dataType.head.dataType ==
    +    val hashMapSerializer = serializerForType(ScalaReflection.localTypeOf[HashMap[Int, Int]])
    +    assert(hashMapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val hashMapDeserializer = deserializerFor[HashMap[Int, Int]]
         assert(hashMapDeserializer.dataType == ObjectType(classOf[HashMap[_, _]]))
     
         import scala.collection.mutable.{LinkedHashMap => LHMap}
    -    val linkedHashMapSerializer = serializerFor[LHMap[Long, String]](BoundReference(
    -      0, ObjectType(classOf[LHMap[Long, String]]), nullable = false))
    -    assert(linkedHashMapSerializer.dataType.head.dataType ==
    +    val linkedHashMapSerializer = serializerForType(
    +        ScalaReflection.localTypeOf[LHMap[Long, String]])
    +    assert(linkedHashMapSerializer.dataType ==
           MapType(LongType, StringType, valueContainsNull = true))
         val linkedHashMapDeserializer = deserializerFor[LHMap[Long, String]]
         assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]]))
       }
     
       test("SPARK-22442: Generate correct field names for special characters") {
    -    val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
    -      0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
    +    val serializer = serializerForType(ScalaReflection.localTypeOf[SpecialCharAsFieldData])
    --- End diff --
    
    Ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227675871
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -434,17 +426,34 @@ object ScalaReflection extends ScalaReflection {
        *  * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"`
        *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")`
        */
    -  def serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = {
    -    val tpe = localTypeOf[T]
    +  def serializerForType(tpe: `Type`,
    +      cls: RuntimeClass): Expression = ScalaReflection.cleanUpReflectionObjects {
         val clsName = getClassNameFromType(tpe)
         val walkedTypePath = s"""- root class: "$clsName"""" :: Nil
    -    serializerFor(inputObject, tpe, walkedTypePath) match {
    -      case expressions.If(_, _, s: CreateNamedStruct) if definedByConstructorParams(tpe) => s
    -      case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil)
    -    }
    +
    +    // The input object to `ExpressionEncoder` is located at first column of an row.
    +    val inputObject = BoundReference(0, dataTypeFor(tpe),
    +      nullable = !cls.isPrimitive)
    --- End diff --
    
    Yes, we can check `tpe.typeSymbol.asClass.isPrimitive` instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97885/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Thanks @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97570/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4058/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97671/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226846925
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -43,10 +44,11 @@ import org.apache.spark.util.Utils
      *    to the name `value`.
      */
     object ExpressionEncoder {
    +
       def apply[T : TypeTag](): ExpressionEncoder[T] = {
    -    // We convert the not-serializable TypeTag into StructType and ClassTag.
         val mirror = ScalaReflection.mirror
    -    val tpe = typeTag[T].in(mirror).tpe
    +    val tpe = ScalaReflection.localTypeOf[T]
    --- End diff --
    
    I think it should be fine, but let me revert this change first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4159/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227867735
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -280,59 +281,59 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("serialize and deserialize arbitrary sequence types") {
         import scala.collection.immutable.Queue
    -    val queueSerializer = serializerFor[Queue[Int]](BoundReference(
    -      0, ObjectType(classOf[Queue[Int]]), nullable = false))
    -    assert(queueSerializer.dataType.head.dataType ==
    +    val queueSerializer = serializerForType(ScalaReflection.localTypeOf[Queue[Int]])
    +    assert(queueSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val queueDeserializer = deserializerFor[Queue[Int]]
         assert(queueDeserializer.dataType == ObjectType(classOf[Queue[_]]))
     
         import scala.collection.mutable.ArrayBuffer
    -    val arrayBufferSerializer = serializerFor[ArrayBuffer[Int]](BoundReference(
    -      0, ObjectType(classOf[ArrayBuffer[Int]]), nullable = false))
    -    assert(arrayBufferSerializer.dataType.head.dataType ==
    +    val arrayBufferSerializer = serializerForType(ScalaReflection.localTypeOf[ArrayBuffer[Int]])
    +    assert(arrayBufferSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val arrayBufferDeserializer = deserializerFor[ArrayBuffer[Int]]
         assert(arrayBufferDeserializer.dataType == ObjectType(classOf[ArrayBuffer[_]]))
       }
     
       test("serialize and deserialize arbitrary map types") {
    -    val mapSerializer = serializerFor[Map[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[Map[Int, Int]]), nullable = false))
    -    assert(mapSerializer.dataType.head.dataType ==
    +    val mapSerializer = serializerForType(ScalaReflection.localTypeOf[Map[Int, Int]])
    +    assert(mapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val mapDeserializer = deserializerFor[Map[Int, Int]]
         assert(mapDeserializer.dataType == ObjectType(classOf[Map[_, _]]))
     
         import scala.collection.immutable.HashMap
    -    val hashMapSerializer = serializerFor[HashMap[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[HashMap[Int, Int]]), nullable = false))
    -    assert(hashMapSerializer.dataType.head.dataType ==
    +    val hashMapSerializer = serializerForType(ScalaReflection.localTypeOf[HashMap[Int, Int]])
    +    assert(hashMapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val hashMapDeserializer = deserializerFor[HashMap[Int, Int]]
         assert(hashMapDeserializer.dataType == ObjectType(classOf[HashMap[_, _]]))
     
         import scala.collection.mutable.{LinkedHashMap => LHMap}
    -    val linkedHashMapSerializer = serializerFor[LHMap[Long, String]](BoundReference(
    -      0, ObjectType(classOf[LHMap[Long, String]]), nullable = false))
    -    assert(linkedHashMapSerializer.dataType.head.dataType ==
    +    val linkedHashMapSerializer = serializerForType(
    +        ScalaReflection.localTypeOf[LHMap[Long, String]])
    +    assert(linkedHashMapSerializer.dataType ==
           MapType(LongType, StringType, valueContainsNull = true))
         val linkedHashMapDeserializer = deserializerFor[LHMap[Long, String]]
         assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]]))
       }
     
       test("SPARK-22442: Generate correct field names for special characters") {
    -    val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
    -      0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
    +    val serializer = serializerForType(ScalaReflection.localTypeOf[SpecialCharAsFieldData])
    --- End diff --
    
    like `deserializerFor` in this suite, let's also create a `serializerFor`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97539/testReport)** for PR 22749 at commit [`85a9122`](https://github.com/apache/spark/commit/85a91220ec4eb00bd9d5020ecf980eac0301f716).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97568/testReport)** for PR 22749 at commit [`35700f4`](https://github.com/apache/spark/commit/35700f4a0f36fb397ac028a68011a2753c5c2c75).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97694/testReport)** for PR 22749 at commit [`400f878`](https://github.com/apache/spark/commit/400f87817183640006140e2db1839f8d78a13856).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97630/testReport)** for PR 22749 at commit [`efbc3fc`](https://github.com/apache/spark/commit/efbc3fc05ce42bda932b54c78c7f7b7eca90419f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97671/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Let me rebase again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97971/testReport)** for PR 22749 at commit [`078a071`](https://github.com/apache/spark/commit/078a071a72e0d39cece49ff73c09ec65a387b8af).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97697/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    hmm, I don't touch PySpark files. Why the building fails at PySpark pip packaging tests...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97650/testReport)** for PR 22749 at commit [`5b9abb6`](https://github.com/apache/spark/commit/5b9abb67907dfdb0c0c64751db3525564f832422).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97967/testReport)** for PR 22749 at commit [`ed4f4c9`](https://github.com/apache/spark/commit/ed4f4c90ec4c162f56373950089eec4632787817).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97459/testReport)** for PR 22749 at commit [`d755e84`](https://github.com/apache/spark/commit/d755e8406f06117ccc96b8f19debab6b2a736e10).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    cc @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97967/testReport)** for PR 22749 at commit [`ed4f4c9`](https://github.com/apache/spark/commit/ed4f4c90ec4c162f56373950089eec4632787817).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226298803
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +183,88 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A set of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, we directly use the `serializer`.
    +   * 2. For other cases, we create a struct to wrap the `serializer`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val serializedAsStruct = objSerializer.dataType.isInstanceOf[StructType]
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (serializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_, _, s: CreateNamedStruct) => s
    --- End diff --
    
    let's also make sure the if condition is `IsNull`, which better explains why we strip it(it can't be null)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227783724
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -280,59 +281,59 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("serialize and deserialize arbitrary sequence types") {
         import scala.collection.immutable.Queue
    -    val queueSerializer = serializerFor[Queue[Int]](BoundReference(
    -      0, ObjectType(classOf[Queue[Int]]), nullable = false))
    -    assert(queueSerializer.dataType.head.dataType ==
    +    val queueSerializer = serializerForType(ScalaReflection.localTypeOf[Queue[Int]])
    +    assert(queueSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val queueDeserializer = deserializerFor[Queue[Int]]
         assert(queueDeserializer.dataType == ObjectType(classOf[Queue[_]]))
     
         import scala.collection.mutable.ArrayBuffer
    -    val arrayBufferSerializer = serializerFor[ArrayBuffer[Int]](BoundReference(
    -      0, ObjectType(classOf[ArrayBuffer[Int]]), nullable = false))
    -    assert(arrayBufferSerializer.dataType.head.dataType ==
    +    val arrayBufferSerializer = serializerForType(ScalaReflection.localTypeOf[ArrayBuffer[Int]])
    +    assert(arrayBufferSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val arrayBufferDeserializer = deserializerFor[ArrayBuffer[Int]]
         assert(arrayBufferDeserializer.dataType == ObjectType(classOf[ArrayBuffer[_]]))
       }
     
       test("serialize and deserialize arbitrary map types") {
    -    val mapSerializer = serializerFor[Map[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[Map[Int, Int]]), nullable = false))
    -    assert(mapSerializer.dataType.head.dataType ==
    +    val mapSerializer = serializerForType(ScalaReflection.localTypeOf[Map[Int, Int]])
    +    assert(mapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val mapDeserializer = deserializerFor[Map[Int, Int]]
         assert(mapDeserializer.dataType == ObjectType(classOf[Map[_, _]]))
     
         import scala.collection.immutable.HashMap
    -    val hashMapSerializer = serializerFor[HashMap[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[HashMap[Int, Int]]), nullable = false))
    -    assert(hashMapSerializer.dataType.head.dataType ==
    +    val hashMapSerializer = serializerForType(ScalaReflection.localTypeOf[HashMap[Int, Int]])
    +    assert(hashMapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val hashMapDeserializer = deserializerFor[HashMap[Int, Int]]
         assert(hashMapDeserializer.dataType == ObjectType(classOf[HashMap[_, _]]))
     
         import scala.collection.mutable.{LinkedHashMap => LHMap}
    -    val linkedHashMapSerializer = serializerFor[LHMap[Long, String]](BoundReference(
    -      0, ObjectType(classOf[LHMap[Long, String]]), nullable = false))
    -    assert(linkedHashMapSerializer.dataType.head.dataType ==
    +    val linkedHashMapSerializer = serializerForType(
    +        ScalaReflection.localTypeOf[LHMap[Long, String]])
    +    assert(linkedHashMapSerializer.dataType ==
           MapType(LongType, StringType, valueContainsNull = true))
         val linkedHashMapDeserializer = deserializerFor[LHMap[Long, String]]
         assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]]))
       }
     
       test("SPARK-22442: Generate correct field names for special characters") {
    -    val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
    -      0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
    +    val serializer = serializerForType(ScalaReflection.localTypeOf[SpecialCharAsFieldData])
    --- End diff --
    
    can we replace all the `serializerForType` with `serializerFor` in this suite?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226506593
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -103,75 +88,61 @@ object ExpressionEncoder {
        * name/positional binding is preserved.
        */
       def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    +    if (encoders.length > 22) {
    --- End diff --
    
    sure. ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22749


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97650/testReport)** for PR 22749 at commit [`5b9abb6`](https://github.com/apache/spark/commit/5b9abb67907dfdb0c0c64751db3525564f832422).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97651/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4051/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227682672
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -434,17 +426,34 @@ object ScalaReflection extends ScalaReflection {
        *  * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"`
        *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")`
        */
    -  def serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = {
    -    val tpe = localTypeOf[T]
    +  def serializerForType(tpe: `Type`,
    +      cls: RuntimeClass): Expression = ScalaReflection.cleanUpReflectionObjects {
         val clsName = getClassNameFromType(tpe)
         val walkedTypePath = s"""- root class: "$clsName"""" :: Nil
    -    serializerFor(inputObject, tpe, walkedTypePath) match {
    -      case expressions.If(_, _, s: CreateNamedStruct) if definedByConstructorParams(tpe) => s
    -      case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil)
    -    }
    +
    +    // The input object to `ExpressionEncoder` is located at first column of an row.
    +    val inputObject = BoundReference(0, dataTypeFor(tpe),
    +      nullable = !cls.isPrimitive)
    --- End diff --
    
    good, then we don't need `cls` as a parameter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97694/testReport)** for PR 22749 at commit [`400f878`](https://github.com/apache/spark/commit/400f87817183640006140e2db1839f8d78a13856).
     * This patch **fails due to an unknown error code, -9**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4154/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97568/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227742062
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -351,11 +347,15 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("SPARK-23835: add null check to non-nullable types in Tuples") {
         def numberOfCheckedArguments(deserializer: Expression): Int = {
    -      assert(deserializer.isInstanceOf[NewInstance])
    -      deserializer.asInstanceOf[NewInstance].arguments.count(_.isInstanceOf[AssertNotNull])
    +      val newInstance = deserializer.collect { case n: NewInstance => n}.head
    +      newInstance.arguments.count(_.isInstanceOf[AssertNotNull])
         }
    -    assert(numberOfCheckedArguments(deserializerFor[(Double, Double)]) == 2)
    -    assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, Int)]) == 1)
    -    assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, java.lang.Integer)]) == 0)
    +    assert(numberOfCheckedArguments(
    +      deserializerForType(ScalaReflection.localTypeOf[(Double, Double)])) == 2)
    --- End diff --
    
    shall we create a `deserializerFor` method in this test suite to save some code diff?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97953/testReport)** for PR 22749 at commit [`552e8dd`](https://github.com/apache/spark/commit/552e8dd3f5031b97cf5158ed07c77ff923233c79).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaPrefixSpanExample `
      * `trait ScalaReflection extends Logging `
      * `    // TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()``


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4436/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227675880
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -58,12 +58,10 @@ object RowEncoder {
       def apply(schema: StructType): ExpressionEncoder[Row] = {
         val cls = classOf[Row]
         val inputObject = BoundReference(0, ObjectType(cls), nullable = true)
    -    val serializer = serializerFor(AssertNotNull(inputObject, Seq("top level row object")), schema)
    -    val deserializer = deserializerFor(schema)
    +    val serializer = serializerFor(inputObject, schema)
    +    val deserializer = deserializerFor(GetColumnByOrdinal(0, serializer.dataType), schema)
    --- End diff --
    
    in  `ScalaReflection`, we create `GetColumnByOrdinal` in `deserializeFor`, shall we follow it here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226301139
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +183,88 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A set of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, we directly use the `serializer`.
    +   * 2. For other cases, we create a struct to wrap the `serializer`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val serializedAsStruct = objSerializer.dataType.isInstanceOf[StructType]
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (serializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_, _, s: CreateNamedStruct) => s
    +        case s: CreateNamedStruct => s
    +        case _ =>
    +          throw new RuntimeException(s"class $clsName has unexpected serializer: $objSerializer")
    +      }
    +    } else {
    +      // For other input objects like primitive, array, map, etc., we construct a struct to wrap
    +      // the serializer which is a column of an row.
    +      CreateNamedStruct(Literal("value") :: objSerializer :: Nil)
    +    }
    +  }.flatten
    +
    +  /**
    +   * Returns an expression that can be used to deserialize an input row to an object of type `T`
    +   * with a compatible schema. Fields of the row will be extracted using `UnresolvedAttribute`.
    +   * of the same name as the constructor arguments.
    +   *
    +   * For complex objects that are encoded to structs, Fields of the struct will be extracted using
    +   * `GetColumnByOrdinal` with corresponding ordinal.
    +   */
    +  val deserializer: Expression = {
    +    val serializedAsStruct = objSerializer.dataType.isInstanceOf[StructType]
    +
    +    if (serializedAsStruct) {
    +      // We serialized this kind of objects to root-level row. The input of general deserializer
    +      // is a `GetColumnByOrdinal(0)` expression to extract first column of a row. We need to
    +      // transform attributes accessors.
    +      objDeserializer.transform {
    +        case UnresolvedExtractValue(GetColumnByOrdinal(0, _),
    +            Literal(part: UTF8String, StringType)) =>
    +          UnresolvedAttribute.quoted(part.toString)
    +        case GetStructField(GetColumnByOrdinal(0, dt), ordinal, _) =>
    +          GetColumnByOrdinal(ordinal, dt)
    +        case If(IsNull(GetColumnByOrdinal(0, _)), _, n: NewInstance) => n
    +        case If(IsNull(GetColumnByOrdinal(0, _)), _, i: InitializeJavaBean) => i
    +      }
    +    } else {
    +      // For other input objects like primitive, array, map, etc., we deserialize the first column
    +      // of a row to the object.
    +      objDeserializer
    +    }
    +  }
    +
    +  // The schema after converting `T` to a Spark SQL row. This schema is dependent on the given
    +  // serialier.
    +  val schema: StructType = StructType(serializer.map { s =>
    +    StructField(s.name, s.dataType, s.nullable)
    --- End diff --
    
    can we call `dataType` before serializer is analyzed?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r228132840
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +181,91 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A sequence of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, strip the outer If-IsNull and get
    +   *    the `CreateNamedStruct`.
    +   * 2. For other cases, wrap the single serializer with `CreateNamedStruct`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (isSerializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_: IsNull, _, s: CreateNamedStruct) => s
    +        case s: CreateNamedStruct => s
    --- End diff --
    
    when will we hit this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97969 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97969/testReport)** for PR 22749 at commit [`8cb710b`](https://github.com/apache/spark/commit/8cb710b5c7b329468c320b59bb0625866fd8d836).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97485/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4140/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97759/testReport)** for PR 22749 at commit [`400f878`](https://github.com/apache/spark/commit/400f87817183640006140e2db1839f8d78a13856).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226507104
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +183,88 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A set of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, we directly use the `serializer`.
    +   * 2. For other cases, we create a struct to wrap the `serializer`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val serializedAsStruct = objSerializer.dataType.isInstanceOf[StructType]
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (serializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_, _, s: CreateNamedStruct) => s
    --- End diff --
    
    ok.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226519284
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -43,10 +44,11 @@ import org.apache.spark.util.Utils
      *    to the name `value`.
      */
     object ExpressionEncoder {
    +
       def apply[T : TypeTag](): ExpressionEncoder[T] = {
    -    // We convert the not-serializable TypeTag into StructType and ClassTag.
         val mirror = ScalaReflection.mirror
    -    val tpe = typeTag[T].in(mirror).tpe
    +    val tpe = ScalaReflection.localTypeOf[T]
    --- End diff --
    
    `localTypeOf` has a `dealias` at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r228134985
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +181,91 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A sequence of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, strip the outer If-IsNull and get
    +   *    the `CreateNamedStruct`.
    +   * 2. For other cases, wrap the single serializer with `CreateNamedStruct`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (isSerializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_: IsNull, _, s: CreateNamedStruct) => s
    +        case s: CreateNamedStruct => s
    --- End diff --
    
    this is minor, we can update it in another PR. We don't need to wait for another jenkins QA round.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97694/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97569/testReport)** for PR 22749 at commit [`b211ed0`](https://github.com/apache/spark/commit/b211ed069dceb33c45cf6caf12c19527334d4ad8).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226296369
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -103,75 +88,61 @@ object ExpressionEncoder {
        * name/positional binding is preserved.
        */
       def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    --- End diff --
    
    cool, this method is simplified a lot with the new abstraction.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97697/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227739775
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +181,90 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A sequence of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, we directly use the `serializer`.
    +   * 2. For other cases, we create a struct to wrap the `serializer`.
    --- End diff --
    
    Let's make these 2 comments more precise
    ```
    1. If `serializer` encodes a raw object to a struct, strip the outer if-IsNull and get the CreateNamedStruct
    2. For other cases, wrap the single serializer with CreateNamedStruct
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97479/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97480/testReport)** for PR 22749 at commit [`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226828791
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -198,7 +196,7 @@ object RowEncoder {
     
           if (inputObject.nullable) {
             If(IsNull(inputObject),
    -          Literal.create(null, inputType),
    +          Literal.create(null, nonNullOutput.dataType),
    --- End diff --
    
    This might be worth a separate PR. I'm considering to create one for it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226506566
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -43,10 +44,11 @@ import org.apache.spark.util.Utils
      *    to the name `value`.
      */
     object ExpressionEncoder {
    +
       def apply[T : TypeTag](): ExpressionEncoder[T] = {
    -    // We convert the not-serializable TypeTag into StructType and ClassTag.
         val mirror = ScalaReflection.mirror
    -    val tpe = typeTag[T].in(mirror).tpe
    +    val tpe = ScalaReflection.localTypeOf[T]
    --- End diff --
    
    `localTypeOf` is actually doing the same thing. I think it is better to use ScalaReflection for such thing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97480 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97480/testReport)** for PR 22749 at commit [`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97459/testReport)** for PR 22749 at commit [`d755e84`](https://github.com/apache/spark/commit/d755e8406f06117ccc96b8f19debab6b2a736e10).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `          throw new RuntimeException(s\"class $clsName has unexpected serializer: $objSerializer\")`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4057/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97569/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    LGTM except 2 minor comments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97645/testReport)** for PR 22749 at commit [`5b9abb6`](https://github.com/apache/spark/commit/5b9abb67907dfdb0c0c64751db3525564f832422).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97569/testReport)** for PR 22749 at commit [`b211ed0`](https://github.com/apache/spark/commit/b211ed069dceb33c45cf6caf12c19527334d4ad8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97964/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97480/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97967/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4176/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226294017
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -43,10 +44,11 @@ import org.apache.spark.util.Utils
      *    to the name `value`.
      */
     object ExpressionEncoder {
    +
       def apply[T : TypeTag](): ExpressionEncoder[T] = {
    -    // We convert the not-serializable TypeTag into StructType and ClassTag.
         val mirror = ScalaReflection.mirror
    -    val tpe = typeTag[T].in(mirror).tpe
    +    val tpe = ScalaReflection.localTypeOf[T]
    --- End diff --
    
    why change it from `typeTag[T].in(mirror).tpe`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97991/testReport)** for PR 22749 at commit [`c00d5e4`](https://github.com/apache/spark/commit/c00d5e44a21f8053a97db755f7a705872d4121eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226295859
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -103,75 +88,61 @@ object ExpressionEncoder {
        * name/positional binding is preserved.
        */
       def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    +    if (encoders.length > 22) {
    +      throw new RuntimeException("Can't construct a tuple encoder for more than 22 encoders.")
    +    }
    +
         encoders.foreach(_.assertUnresolved())
     
         val schema = StructType(encoders.zipWithIndex.map {
           case (e, i) =>
    -        val (dataType, nullable) = if (e.flat) {
    -          e.schema.head.dataType -> e.schema.head.nullable
    -        } else {
    -          e.schema -> true
    -        }
    -        StructField(s"_${i + 1}", dataType, nullable)
    +        StructField(s"_${i + 1}", e.objSerializer.dataType, e.objSerializer.nullable)
         })
     
         val cls = Utils.getContextOrSparkClassLoader.loadClass(s"scala.Tuple${encoders.size}")
     
    -    val serializer = encoders.zipWithIndex.map { case (enc, index) =>
    -      val originalInputObject = enc.serializer.head.collect { case b: BoundReference => b }.head
    +    val serializers = encoders.zipWithIndex.map { case (enc, index) =>
    +      val boundRefs = enc.objSerializer.collect { case b: BoundReference => b }.distinct
    +      assert(boundRefs.size == 1, "object serializer should have only one bound reference but " +
    +        s"there are ${boundRefs.size}")
    +
    +      val originalInputObject = boundRefs.head
           val newInputObject = Invoke(
             BoundReference(0, ObjectType(cls), nullable = true),
             s"_${index + 1}",
    -        originalInputObject.dataType)
    +        originalInputObject.dataType,
    +        returnNullable = originalInputObject.nullable)
     
    -      val newSerializer = enc.serializer.map(_.transformUp {
    +      val newSerializer = enc.objSerializer.transformUp {
             case b: BoundReference if b == originalInputObject => newInputObject
    --- End diff --
    
    Since there is only one distinct `BoundReference`, we can just write `case b: BoundReference  => newInputObject`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97485/testReport)** for PR 22749 at commit [`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97991/testReport)** for PR 22749 at commit [`c00d5e4`](https://github.com/apache/spark/commit/c00d5e44a21f8053a97db755f7a705872d4121eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97795/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    hmm, seems even re-open still can't re-trigger AppVeyor... cc @HyukjinKwon do you know why?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97630/testReport)** for PR 22749 at commit [`efbc3fc`](https://github.com/apache/spark/commit/efbc3fc05ce42bda932b54c78c7f7b7eca90419f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97971/testReport)** for PR 22749 at commit [`078a071`](https://github.com/apache/spark/commit/078a071a72e0d39cece49ff73c09ec65a387b8af).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227672066
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -434,17 +426,34 @@ object ScalaReflection extends ScalaReflection {
        *  * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"`
        *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")`
        */
    -  def serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = {
    -    val tpe = localTypeOf[T]
    +  def serializerForType(tpe: `Type`,
    +      cls: RuntimeClass): Expression = ScalaReflection.cleanUpReflectionObjects {
         val clsName = getClassNameFromType(tpe)
         val walkedTypePath = s"""- root class: "$clsName"""" :: Nil
    -    serializerFor(inputObject, tpe, walkedTypePath) match {
    -      case expressions.If(_, _, s: CreateNamedStruct) if definedByConstructorParams(tpe) => s
    -      case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil)
    -    }
    +
    +    // The input object to `ExpressionEncoder` is located at first column of an row.
    +    val inputObject = BoundReference(0, dataTypeFor(tpe),
    +      nullable = !cls.isPrimitive)
    --- End diff --
    
    we just check isPrimitive of the given `cls`, can we check `tpe` directly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97953/testReport)** for PR 22749 at commit [`552e8dd`](https://github.com/apache/spark/commit/552e8dd3f5031b97cf5158ed07c77ff923233c79).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4438/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226506718
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -103,75 +88,61 @@ object ExpressionEncoder {
        * name/positional binding is preserved.
        */
       def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    +    if (encoders.length > 22) {
    +      throw new RuntimeException("Can't construct a tuple encoder for more than 22 encoders.")
    +    }
    +
         encoders.foreach(_.assertUnresolved())
     
         val schema = StructType(encoders.zipWithIndex.map {
           case (e, i) =>
    -        val (dataType, nullable) = if (e.flat) {
    -          e.schema.head.dataType -> e.schema.head.nullable
    -        } else {
    -          e.schema -> true
    -        }
    -        StructField(s"_${i + 1}", dataType, nullable)
    +        StructField(s"_${i + 1}", e.objSerializer.dataType, e.objSerializer.nullable)
         })
     
         val cls = Utils.getContextOrSparkClassLoader.loadClass(s"scala.Tuple${encoders.size}")
     
    -    val serializer = encoders.zipWithIndex.map { case (enc, index) =>
    -      val originalInputObject = enc.serializer.head.collect { case b: BoundReference => b }.head
    +    val serializers = encoders.zipWithIndex.map { case (enc, index) =>
    +      val boundRefs = enc.objSerializer.collect { case b: BoundReference => b }.distinct
    +      assert(boundRefs.size == 1, "object serializer should have only one bound reference but " +
    +        s"there are ${boundRefs.size}")
    +
    +      val originalInputObject = boundRefs.head
           val newInputObject = Invoke(
             BoundReference(0, ObjectType(cls), nullable = true),
             s"_${index + 1}",
    -        originalInputObject.dataType)
    +        originalInputObject.dataType,
    +        returnNullable = originalInputObject.nullable)
     
    -      val newSerializer = enc.serializer.map(_.transformUp {
    +      val newSerializer = enc.objSerializer.transformUp {
             case b: BoundReference if b == originalInputObject => newInputObject
    --- End diff --
    
    yes, right.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4107/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/22749


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97460/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97671/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97991/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4160/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97539/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97568/testReport)** for PR 22749 at commit [`35700f4`](https://github.com/apache/spark/commit/35700f4a0f36fb397ac028a68011a2753c5c2c75).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r228135443
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +181,91 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A sequence of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, strip the outer If-IsNull and get
    +   *    the `CreateNamedStruct`.
    +   * 2. For other cases, wrap the single serializer with `CreateNamedStruct`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (isSerializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_: IsNull, _, s: CreateNamedStruct) => s
    +        case s: CreateNamedStruct => s
    --- End diff --
    
    Ok. Sounds good to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97570/testReport)** for PR 22749 at commit [`0c78b73`](https://github.com/apache/spark/commit/0c78b73e5abce2a51763c860e43aab214c8634d9).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4052/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97570/testReport)** for PR 22749 at commit [`0c78b73`](https://github.com/apache/spark/commit/0c78b73e5abce2a51763c860e43aab214c8634d9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    I like this idea! waiting for tests pass


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227682823
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -58,12 +58,10 @@ object RowEncoder {
       def apply(schema: StructType): ExpressionEncoder[Row] = {
         val cls = classOf[Row]
         val inputObject = BoundReference(0, ObjectType(cls), nullable = true)
    -    val serializer = serializerFor(AssertNotNull(inputObject, Seq("top level row object")), schema)
    -    val deserializer = deserializerFor(schema)
    +    val serializer = serializerFor(inputObject, schema)
    +    val deserializer = deserializerFor(GetColumnByOrdinal(0, serializer.dataType), schema)
    --- End diff --
    
    ah i see, then let's leave it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97650/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97759 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97759/testReport)** for PR 22749 at commit [`400f878`](https://github.com/apache/spark/commit/400f87817183640006140e2db1839f8d78a13856).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97953/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97460/testReport)** for PR 22749 at commit [`84f3ce0`](https://github.com/apache/spark/commit/84f3ce07f2f6a9236bd27f927fbb877e937f6917).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227677176
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -58,12 +58,10 @@ object RowEncoder {
       def apply(schema: StructType): ExpressionEncoder[Row] = {
         val cls = classOf[Row]
         val inputObject = BoundReference(0, ObjectType(cls), nullable = true)
    -    val serializer = serializerFor(AssertNotNull(inputObject, Seq("top level row object")), schema)
    -    val deserializer = deserializerFor(schema)
    +    val serializer = serializerFor(inputObject, schema)
    +    val deserializer = deserializerFor(GetColumnByOrdinal(0, serializer.dataType), schema)
    --- End diff --
    
    Ok. Sounds better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4380/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97764/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    hmm, it still has conflict...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4215/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226299441
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +183,88 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A set of expressions, one for each top-level field that can be used to
    --- End diff --
    
    set -> sequence


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97539/testReport)** for PR 22749 at commit [`85a9122`](https://github.com/apache/spark/commit/85a91220ec4eb00bd9d5020ecf980eac0301f716).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227797378
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala ---
    @@ -280,59 +281,59 @@ class ScalaReflectionSuite extends SparkFunSuite {
     
       test("serialize and deserialize arbitrary sequence types") {
         import scala.collection.immutable.Queue
    -    val queueSerializer = serializerFor[Queue[Int]](BoundReference(
    -      0, ObjectType(classOf[Queue[Int]]), nullable = false))
    -    assert(queueSerializer.dataType.head.dataType ==
    +    val queueSerializer = serializerForType(ScalaReflection.localTypeOf[Queue[Int]])
    +    assert(queueSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val queueDeserializer = deserializerFor[Queue[Int]]
         assert(queueDeserializer.dataType == ObjectType(classOf[Queue[_]]))
     
         import scala.collection.mutable.ArrayBuffer
    -    val arrayBufferSerializer = serializerFor[ArrayBuffer[Int]](BoundReference(
    -      0, ObjectType(classOf[ArrayBuffer[Int]]), nullable = false))
    -    assert(arrayBufferSerializer.dataType.head.dataType ==
    +    val arrayBufferSerializer = serializerForType(ScalaReflection.localTypeOf[ArrayBuffer[Int]])
    +    assert(arrayBufferSerializer.dataType ==
           ArrayType(IntegerType, containsNull = false))
         val arrayBufferDeserializer = deserializerFor[ArrayBuffer[Int]]
         assert(arrayBufferDeserializer.dataType == ObjectType(classOf[ArrayBuffer[_]]))
       }
     
       test("serialize and deserialize arbitrary map types") {
    -    val mapSerializer = serializerFor[Map[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[Map[Int, Int]]), nullable = false))
    -    assert(mapSerializer.dataType.head.dataType ==
    +    val mapSerializer = serializerForType(ScalaReflection.localTypeOf[Map[Int, Int]])
    +    assert(mapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val mapDeserializer = deserializerFor[Map[Int, Int]]
         assert(mapDeserializer.dataType == ObjectType(classOf[Map[_, _]]))
     
         import scala.collection.immutable.HashMap
    -    val hashMapSerializer = serializerFor[HashMap[Int, Int]](BoundReference(
    -      0, ObjectType(classOf[HashMap[Int, Int]]), nullable = false))
    -    assert(hashMapSerializer.dataType.head.dataType ==
    +    val hashMapSerializer = serializerForType(ScalaReflection.localTypeOf[HashMap[Int, Int]])
    +    assert(hashMapSerializer.dataType ==
           MapType(IntegerType, IntegerType, valueContainsNull = false))
         val hashMapDeserializer = deserializerFor[HashMap[Int, Int]]
         assert(hashMapDeserializer.dataType == ObjectType(classOf[HashMap[_, _]]))
     
         import scala.collection.mutable.{LinkedHashMap => LHMap}
    -    val linkedHashMapSerializer = serializerFor[LHMap[Long, String]](BoundReference(
    -      0, ObjectType(classOf[LHMap[Long, String]]), nullable = false))
    -    assert(linkedHashMapSerializer.dataType.head.dataType ==
    +    val linkedHashMapSerializer = serializerForType(
    +        ScalaReflection.localTypeOf[LHMap[Long, String]])
    +    assert(linkedHashMapSerializer.dataType ==
           MapType(LongType, StringType, valueContainsNull = true))
         val linkedHashMapDeserializer = deserializerFor[LHMap[Long, String]]
         assert(linkedHashMapDeserializer.dataType == ObjectType(classOf[LHMap[_, _]]))
       }
     
       test("SPARK-22442: Generate correct field names for special characters") {
    -    val serializer = serializerFor[SpecialCharAsFieldData](BoundReference(
    -      0, ObjectType(classOf[SpecialCharAsFieldData]), nullable = false))
    +    val serializer = serializerForType(ScalaReflection.localTypeOf[SpecialCharAsFieldData])
    --- End diff --
    
    Do you mean to create a method `serializerFor` in this suite? Or replace `serializerForType` with `ScalaReflection.serializerFor`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97630/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226294255
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -103,75 +88,61 @@ object ExpressionEncoder {
        * name/positional binding is preserved.
        */
       def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    +    if (encoders.length > 22) {
    --- End diff --
    
    can we do it in a separated PR with a test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4454/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4037/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97651/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4105/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEnc...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r226301402
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +183,88 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A set of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, we directly use the `serializer`.
    +   * 2. For other cases, we create a struct to wrap the `serializer`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val serializedAsStruct = objSerializer.dataType.isInstanceOf[StructType]
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (serializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_, _, s: CreateNamedStruct) => s
    +        case s: CreateNamedStruct => s
    +        case _ =>
    +          throw new RuntimeException(s"class $clsName has unexpected serializer: $objSerializer")
    +      }
    +    } else {
    +      // For other input objects like primitive, array, map, etc., we construct a struct to wrap
    +      // the serializer which is a column of an row.
    +      CreateNamedStruct(Literal("value") :: objSerializer :: Nil)
    +    }
    +  }.flatten
    +
    +  /**
    +   * Returns an expression that can be used to deserialize an input row to an object of type `T`
    +   * with a compatible schema. Fields of the row will be extracted using `UnresolvedAttribute`.
    +   * of the same name as the constructor arguments.
    +   *
    +   * For complex objects that are encoded to structs, Fields of the struct will be extracted using
    +   * `GetColumnByOrdinal` with corresponding ordinal.
    +   */
    +  val deserializer: Expression = {
    +    val serializedAsStruct = objSerializer.dataType.isInstanceOf[StructType]
    +
    +    if (serializedAsStruct) {
    +      // We serialized this kind of objects to root-level row. The input of general deserializer
    +      // is a `GetColumnByOrdinal(0)` expression to extract first column of a row. We need to
    +      // transform attributes accessors.
    +      objDeserializer.transform {
    +        case UnresolvedExtractValue(GetColumnByOrdinal(0, _),
    +            Literal(part: UTF8String, StringType)) =>
    +          UnresolvedAttribute.quoted(part.toString)
    +        case GetStructField(GetColumnByOrdinal(0, dt), ordinal, _) =>
    +          GetColumnByOrdinal(ordinal, dt)
    +        case If(IsNull(GetColumnByOrdinal(0, _)), _, n: NewInstance) => n
    +        case If(IsNull(GetColumnByOrdinal(0, _)), _, i: InitializeJavaBean) => i
    +      }
    +    } else {
    +      // For other input objects like primitive, array, map, etc., we deserialize the first column
    +      // of a row to the object.
    +      objDeserializer
    +    }
    +  }
    +
    +  // The schema after converting `T` to a Spark SQL row. This schema is dependent on the given
    +  // serialier.
    +  val schema: StructType = StructType(serializer.map { s =>
    +    StructField(s.name, s.dataType, s.nullable)
    --- End diff --
    
    nvm, serializer don't need analysis


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4425/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227678714
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala ---
    @@ -58,12 +58,10 @@ object RowEncoder {
       def apply(schema: StructType): ExpressionEncoder[Row] = {
         val cls = classOf[Row]
         val inputObject = BoundReference(0, ObjectType(cls), nullable = true)
    -    val serializer = serializerFor(AssertNotNull(inputObject, Seq("top level row object")), schema)
    -    val deserializer = deserializerFor(schema)
    +    val serializer = serializerFor(inputObject, schema)
    +    val deserializer = deserializerFor(GetColumnByOrdinal(0, serializer.dataType), schema)
    --- End diff --
    
    Ah, we need to access `serializer.dataType` here. So if we want to create `GetColumnByOrdinal` in `deserializeFor`, we need to pass this data type too. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97764/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227695574
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -434,17 +426,34 @@ object ScalaReflection extends ScalaReflection {
        *  * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"`
        *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")`
        */
    -  def serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = {
    -    val tpe = localTypeOf[T]
    +  def serializerForType(tpe: `Type`,
    +      cls: RuntimeClass): Expression = ScalaReflection.cleanUpReflectionObjects {
         val clsName = getClassNameFromType(tpe)
         val walkedTypePath = s"""- root class: "$clsName"""" :: Nil
    -    serializerFor(inputObject, tpe, walkedTypePath) match {
    -      case expressions.If(_, _, s: CreateNamedStruct) if definedByConstructorParams(tpe) => s
    -      case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil)
    -    }
    +
    +    // The input object to `ExpressionEncoder` is located at first column of an row.
    +    val inputObject = BoundReference(0, dataTypeFor(tpe),
    +      nullable = !cls.isPrimitive)
    +
    +    serializerFor(inputObject, tpe, walkedTypePath)
       }
     
    -  /** Helper for extracting internal fields from a case class. */
    +  /**
    +   * Returns an expression for serializing the value of an input expression into Spark SQL
    --- End diff --
    
    I did simplify a lot of it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97969/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4439/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Close and reopen this to re-trigger AppVeyor.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97460/testReport)** for PR 22749 at commit [`84f3ce0`](https://github.com/apache/spark/commit/84f3ce07f2f6a9236bd27f927fbb877e937f6917).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `          throw new RuntimeException(s\"class $clsName has unexpected serializer: $objSerializer\")`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97645/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)** for PR 22749 at commit [`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4434/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97651/testReport)** for PR 22749 at commit [`7432344`](https://github.com/apache/spark/commit/7432344143fb4889ed3d5cbde21872c8fdd6d3f1).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97964/testReport)** for PR 22749 at commit [`ed4f4c9`](https://github.com/apache/spark/commit/ed4f4c90ec4c162f56373950089eec4632787817).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97885/testReport)** for PR 22749 at commit [`400f878`](https://github.com/apache/spark/commit/400f87817183640006140e2db1839f8d78a13856).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97969 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97969/testReport)** for PR 22749 at commit [`8cb710b`](https://github.com/apache/spark/commit/8cb710b5c7b329468c320b59bb0625866fd8d836).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
GitHub user viirya reopened a pull request:

    https://github.com/apache/spark/pull/22749

    [SPARK-25746][SQL] Refactoring ExpressionEncoder to get rid of flat flag

    ## What changes were proposed in this pull request?
    
    This is inspired during implementing #21732. For now `ScalaReflection` needs to consider how `ExpressionEncoder` uses generated serializers and deserializers. And `ExpressionEncoder` has a weird `flat` flag. After discussion with @cloud-fan, it seems to be better to refactor `ExpressionEncoder`. It should make SPARK-24762 easier to do.
    
    To summarize the proposed changes:
    
    1. `serializerFor` and `deserializerFor` return expressions for serializing/deserializing an input expression for a given type. They are private and should not be called directly.
    2. `serializerForType` and `deserializerForType` returns an expression for serializing/deserializing for an object of type T to/from Spark SQL representation. It assumes the input object/Spark SQL representation is located at ordinal 0 of a row.
    
    So in other words, `serializerForType` and `deserializerForType` return expressions for atomically serializing/deserializing JVM object to/from Spark SQL value.
    
    A serializer returned by `serializerForType` will serialize an object at `row(0)` to a corresponding Spark SQL representation, e.g. primitive type, array, map, struct.
    
    A deserializer returned by `deserializerForType` will deserialize an input field at `row(0)` to an object with given type.
    
    3. The construction of `ExpressionEncoder` takes a pair of serializer and deserializer for type `T`. It uses them to create serializer and deserializer for T <-> row serialization. Now `ExpressionEncoder` dones't need to remember if serializer is flat or not. When we need to construct new `ExpressionEncoder` based on existing ones, we only need to change input location in the atomic serializer and deserializer.
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-24762-refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22749.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22749
    
----
commit e1b5deebe715479125c8878f0c90a55dc9ab3e85
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-07-09T03:42:04Z

    Aggregator should be able to use Option of Product encoder.

commit 80506f4e98184ccd66dbaac14ec52d69c358020d
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-07-13T04:40:55Z

    Enable top-level Option of Product encoders.

commit ed3d5cb697b10af2e2cf4c78ab521d4d0b2f3c9b
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T04:26:28Z

    Remove topLevel parameter.

commit 9fc3f6165156051142a8366a32726badaaa16bb7
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T04:37:39Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762

commit 5f95bd0cf1bd308c7df55c41caef7a9f19368f5d
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T04:42:33Z

    Remove useless change.

commit a4f04055b2ba22f371663565710328791942855a
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-24T14:38:16Z

    Add more tests.

commit c1f798f7e9cba0d04223eed06f1b1f547ec29dc5
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-08-25T01:52:01Z

    Add test.

commit 80e11d289d7775863cb9c28b2c1d4364292048a4
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-06T04:06:57Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762

commit 0f029b0a28700334dc6334f1ad89b3124f235a51
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-06T04:40:07Z

    Improve code comments.

commit 84f3ce07f2f6a9236bd27f927fbb877e937f6917
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-15T09:55:03Z

    Refactoring ExpressionEncoder.

commit 6a6fa454e22728cc2ad8e5515cd587fe0be84b26
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-17T02:07:40Z

    Fix Malformed class name.

commit 25a616286075ca4f0a7d528095b387172b05c6c3
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-17T05:11:10Z

    Fix error message.

commit 295ecde8103c26dda169d931f939f8a2fe641c4c
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-18T15:58:03Z

    Fix test.

commit 85a91220ec4eb00bd9d5020ecf980eac0301f716
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-18T16:05:22Z

    Merge remote-tracking branch 'upstream/master' into SPARK-24762-refactor

commit 35700f4a0f36fb397ac028a68011a2753c5c2c75
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-19T00:07:29Z

    Fix rebase error.

commit b211ed069dceb33c45cf6caf12c19527334d4ad8
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-19T00:16:24Z

    Fix unintentional style change.

commit 0c78b73e5abce2a51763c860e43aab214c8634d9
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-19T00:51:52Z

    Address comments.

commit 5b9abb67907dfdb0c0c64751db3525564f832422
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-20T02:26:07Z

    Address ComplexTypeMergingExpression issue.

commit 7432344143fb4889ed3d5cbde21872c8fdd6d3f1
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-20T12:47:37Z

    Try more reasonable solution.

commit 400f87817183640006140e2db1839f8d78a13856
Author: Liang-Chi Hsieh <vi...@...>
Date:   2018-10-22T02:56:20Z

    Address comment.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4094/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder to get ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r227673675
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -434,17 +426,34 @@ object ScalaReflection extends ScalaReflection {
        *  * the element type of [[Array]] or [[Seq]]: `array element class: "abc.xyz.MyClass"`
        *  * the field of [[Product]]: `field (class: "abc.xyz.MyClass", name: "myField")`
        */
    -  def serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct = {
    -    val tpe = localTypeOf[T]
    +  def serializerForType(tpe: `Type`,
    +      cls: RuntimeClass): Expression = ScalaReflection.cleanUpReflectionObjects {
         val clsName = getClassNameFromType(tpe)
         val walkedTypePath = s"""- root class: "$clsName"""" :: Nil
    -    serializerFor(inputObject, tpe, walkedTypePath) match {
    -      case expressions.If(_, _, s: CreateNamedStruct) if definedByConstructorParams(tpe) => s
    -      case other => CreateNamedStruct(expressions.Literal("value") :: other :: Nil)
    -    }
    +
    +    // The input object to `ExpressionEncoder` is located at first column of an row.
    +    val inputObject = BoundReference(0, dataTypeFor(tpe),
    +      nullable = !cls.isPrimitive)
    +
    +    serializerFor(inputObject, tpe, walkedTypePath)
       }
     
    -  /** Helper for extracting internal fields from a case class. */
    +  /**
    +   * Returns an expression for serializing the value of an input expression into Spark SQL
    --- End diff --
    
    do we really need to duplicate the doc in this private method?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22749: [SPARK-25746][SQL] Refactoring ExpressionEncoder ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22749#discussion_r228133724
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---
    @@ -212,21 +181,91 @@ object ExpressionEncoder {
      * A generic encoder for JVM objects that uses Catalyst Expressions for a `serializer`
      * and a `deserializer`.
      *
    - * @param schema The schema after converting `T` to a Spark SQL row.
    - * @param serializer A set of expressions, one for each top-level field that can be used to
    - *                   extract the values from a raw object into an [[InternalRow]].
    - * @param deserializer An expression that will construct an object given an [[InternalRow]].
    + * @param objSerializer An expression that can be used to encode a raw object to corresponding
    + *                   Spark SQL representation that can be a primitive column, array, map or a
    + *                   struct. This represents how Spark SQL generally serializes an object of
    + *                   type `T`.
    + * @param objDeserializer An expression that will construct an object given a Spark SQL
    + *                        representation. This represents how Spark SQL generally deserializes
    + *                        a serialized value in Spark SQL representation back to an object of
    + *                        type `T`.
      * @param clsTag A classtag for `T`.
      */
     case class ExpressionEncoder[T](
    -    schema: StructType,
    -    flat: Boolean,
    -    serializer: Seq[Expression],
    -    deserializer: Expression,
    +    objSerializer: Expression,
    +    objDeserializer: Expression,
         clsTag: ClassTag[T])
       extends Encoder[T] {
     
    -  if (flat) require(serializer.size == 1)
    +  /**
    +   * A sequence of expressions, one for each top-level field that can be used to
    +   * extract the values from a raw object into an [[InternalRow]]:
    +   * 1. If `serializer` encodes a raw object to a struct, strip the outer If-IsNull and get
    +   *    the `CreateNamedStruct`.
    +   * 2. For other cases, wrap the single serializer with `CreateNamedStruct`.
    +   */
    +  val serializer: Seq[NamedExpression] = {
    +    val clsName = Utils.getSimpleName(clsTag.runtimeClass)
    +
    +    if (isSerializedAsStruct) {
    +      val nullSafeSerializer = objSerializer.transformUp {
    +        case r: BoundReference =>
    +          // For input object of Product type, we can't encode it to row if it's null, as Spark SQL
    +          // doesn't allow top-level row to be null, only its columns can be null.
    +          AssertNotNull(r, Seq("top level Product or row object"))
    +      }
    +      nullSafeSerializer match {
    +        case If(_: IsNull, _, s: CreateNamedStruct) => s
    +        case s: CreateNamedStruct => s
    --- End diff --
    
    oh, good catch! I think this is redundant pattern.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4036/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97479/testReport)** for PR 22749 at commit [`6a6fa45`](https://github.com/apache/spark/commit/6a6fa454e22728cc2ad8e5515cd587fe0be84b26).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97459/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    **[Test build #97485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97485/testReport)** for PR 22749 at commit [`25a6162`](https://github.com/apache/spark/commit/25a616286075ca4f0a7d528095b387172b05c6c3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22749: [WIP][SPARK-25746][SQL] Refactoring ExpressionEncoder to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22749
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4106/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org