You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/01/11 07:41:45 UTC

[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/16546

    [WIP][SQL] Put check in ExpressionEncoder.fromRow to ensure we can convert deserialized object to required type

    ## What changes were proposed in this pull request?
    
    Two problems are addressed in this patch.
    
    1. Serialize subclass of `Seq[_]` which doesn't have element type
    
    Currently, in `ScalaReflection.serializerFor`, we try to serialize all sub types of `Seq[_]`. But for `Range` which is a `Seq[Int]` and doesn't have element type, `serializerFor` will fail and show mystery messages:
    
        scala.MatchError: scala.collection.immutable.Range.Inclusive (of class scala.reflect.internal.Types$ClassNoArgsTypeRef)
          at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:520)
          at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:463)
          at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71) 
    
    This patch tries to fix this by considering the types without element type.
    
    2. Encoder can't deserialize internal row to required type
    
    We serialize the objects with common super class such as `Seq[_]` to a common internal data. But when we want to deserialize the internal data back to the original objects, we will encounter the problem of initialization of different types of objects.
    
    For example, we deserialize the data serialized from `Seq[_]` to `WrappedArray`. It works when we serialize data of `Seq[_]`. If we try to serialize data of subclass of `Seq[_]` (for example `Range`) which is not assignable from `WrappedArray`, there will be runtime error when converting deserialized data to the required subclass of `Seq[_]`.
    
    Except for explicitly writing down the rule to deserialize each subclass of `Seq[_]`, I think the feasible solution is to check if we can convert deserialized data to the required type. This patch puts the check into `ExpressionEncoder.fromRow`. Once the requirement is not matched, we show a reasonable message to users.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 encoder-range

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16546.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16546
    
----
commit 190fb6222d84991000f91735952579b1e0686a61
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-01-11T03:38:44Z

    Put check in ExpressionEncoder.fromRow to ensure we can convert deserialized object to required type.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71218/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/16546


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow...

Posted by michalsenkyr <gi...@git.apache.org>.
Github user michalsenkyr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16546#discussion_r95927063
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala ---
    @@ -120,6 +120,32 @@ object ScalaReflection extends ScalaReflection {
       }
     
       /**
    +   * Returns the element type for Seq[_] and its subclass.
    +   */
    +  def getElementTypeForSeq(t: `Type`): Option[`Type`] = {
    +    if (!(t <:< localTypeOf[Seq[_]])) {
    +      return None
    +    }
    +    val TypeRef(_, _, elementTypeList) = t
    --- End diff --
    
    This would probably be better solved by using `match` and `typeParams`. Something like this:
    
    ```scala
    t match {
      case TypeRef(_, _, Seq(elementType)) => Some(elementType)
      case _ =>
        t.baseClasses.find { c =>
          val cType = c.asClass.toType
          cType <:< localTypeOf[Seq[_]] && cType.typeParams.nonEmpty
        }.map(t.baseType(_).typeParams.head)
    }
    ```
    
    Also not sure whether types with more than one type parameter are handled correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Just noticed that there is a related and interesting pr #16240 merged recently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    **[Test build #71218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71218/testReport)** for PR 16546 at commit [`0af969e`](https://github.com/apache/spark/commit/0af969ea42aee1eabb4de0595c3b25fec899062c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    **[Test build #71196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71196/testReport)** for PR 16546 at commit [`190fb62`](https://github.com/apache/spark/commit/190fb6222d84991000f91735952579b1e0686a61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    **[Test build #71218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71218/testReport)** for PR 16546 at commit [`0af969e`](https://github.com/apache/spark/commit/0af969ea42aee1eabb4de0595c3b25fec899062c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    **[Test build #71192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71192/testReport)** for PR 16546 at commit [`190fb62`](https://github.com/apache/spark/commit/190fb6222d84991000f91735952579b1e0686a61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71192/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71196/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16546: [WIP][SQL] Put check in ExpressionEncoder.fromRow to ens...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16546
  
    **[Test build #71196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71196/testReport)** for PR 16546 at commit [`190fb62`](https://github.com/apache/spark/commit/190fb6222d84991000f91735952579b1e0686a61).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org