You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/01/20 23:16:19 UTC

[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/10852

    [SPARK-12939][SQL] migrate encoder resolution logic to Analyzer

    Now we will catch `ObjectOperator` in `Analyzer` and resolve the `fromRowExpression/deserializer` inside it.  Also update the `MapGroups` and `CoGroup` to pass in `dataAttributes`, so that we can correctly resolve value deserializer(the `child.output` contains both groupking key and values, which may mess things up if they have same-name attribtues). End-to-end tests are added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10852
    
----
commit f34d60240c255d3b7933aeb5f6747e5f611007ff
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-01-20T19:51:15Z

    migrate encoder resolution logic to Analyzer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50777324
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -142,12 +143,14 @@ case class MapGroups(
      */
     case class CoGroup(
         func: (Any, Iterator[Any], Iterator[Any]) => TraversableOnce[Any],
    -    keyObject: Expression,
    -    leftObject: Expression,
    -    rightObject: Expression,
    +    keyDeserializer: Expression,
    +    leftDeserializer: Expression,
    +    rightDeserializer: Expression,
         serializer: Seq[NamedExpression],
         leftGroup: Seq[Attribute],
         rightGroup: Seq[Attribute],
    +    leftAttr: Seq[Attribute],
    +    rightAttr: Seq[Attribute],
    --- End diff --
    
    yea there is only one place where we create a `CoGroup`, but we do pass `GroupedDataset.dataAttributes` instead of left/right output attributes, https://github.com/apache/spark/pull/10852/files#diff-ed737c05b572cbec98ae5264d2c799b4R314


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-174793028
  
    **[Test build #50054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50054/consoleFull)** for PR 10852 at commit [`aae7162`](https://github.com/apache/spark/commit/aae7162f55b27f6446ca7ed965bd5e187a5dddd6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173379617
  
    cc @marmbrus @yhuai @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50774569
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -142,12 +143,14 @@ case class MapGroups(
      */
     case class CoGroup(
         func: (Any, Iterator[Any], Iterator[Any]) => TraversableOnce[Any],
    -    keyObject: Expression,
    -    leftObject: Expression,
    -    rightObject: Expression,
    +    keyDeserializer: Expression,
    +    leftDeserializer: Expression,
    +    rightDeserializer: Expression,
         serializer: Seq[NamedExpression],
         leftGroup: Seq[Attribute],
         rightGroup: Seq[Attribute],
    +    leftAttr: Seq[Attribute],
    +    rightAttr: Seq[Attribute],
    --- End diff --
    
    No they are different. Like `MapGroups`, the child.output also contains grouping attributes, which may cause problems if we have same name attribute in grouping attributes and data attributes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-178967236
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50623/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50768613
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala ---
    @@ -44,18 +53,30 @@ trait ObjectOperator extends LogicalPlan {
       def withObjectOutput: LogicalPlan = if (output.head.dataType.isInstanceOf[ObjectType]) {
         this
       } else {
    -    withNewSerializer(outputObject)
    +    withNewSerializer(outputObject :: Nil)
       }
     
       /** Returns a copy of this operator with a different serializer. */
    -  def withNewSerializer(newSerializer: NamedExpression): LogicalPlan = makeCopy {
    +  def withNewSerializer(newSerializer: Seq[NamedExpression]): LogicalPlan = makeCopy {
    --- End diff --
    
    nit: `withNewSerializers`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-174793253
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50785660
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -142,12 +143,14 @@ case class MapGroups(
      */
     case class CoGroup(
         func: (Any, Iterator[Any], Iterator[Any]) => TraversableOnce[Any],
    -    keyObject: Expression,
    -    leftObject: Expression,
    -    rightObject: Expression,
    +    keyDeserializer: Expression,
    +    leftDeserializer: Expression,
    +    rightDeserializer: Expression,
         serializer: Seq[NamedExpression],
         leftGroup: Seq[Attribute],
         rightGroup: Seq[Attribute],
    +    leftAttr: Seq[Attribute],
    +    rightAttr: Seq[Attribute],
    --- End diff --
    
    I see, thanks for explaining.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173432904
  
    **[Test build #49830 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49830/consoleFull)** for PR 10852 at commit [`60c3c85`](https://github.com/apache/spark/commit/60c3c85b37b7ace325e55917e81e8a8b60f8973b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173406830
  
    **[Test build #49815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49815/consoleFull)** for PR 10852 at commit [`f34d602`](https://github.com/apache/spark/commit/f34d60240c255d3b7933aeb5f6747e5f611007ff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `              s\"Unable to generate an encoder for inner class `$`
      * `case class DummyObjectOperator(deserializer: Expression, attributes: Seq[Attribute])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-174758440
  
    **[Test build #50054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50054/consoleFull)** for PR 10852 at commit [`aae7162`](https://github.com/apache/spark/commit/aae7162f55b27f6446ca7ed965bd5e187a5dddd6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-180601920
  
    Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50767337
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala ---
    @@ -30,6 +30,15 @@ trait ObjectOperator extends LogicalPlan {
       /** The serializer that is used to produce the output of this operator. */
       def serializer: Seq[NamedExpression]
     
    +  override def output: Seq[Attribute] = serializer.map(_.toAttribute)
    +
    +  /**
    +   * An [[ObjectOperator]] may have one or more deserializers to convert internal rows to objects.
    +   * For each deserializer, [[ObjectOperator]] should also provide the attributes that can resolve
    +   * this deserializer.
    --- End diff --
    
    "An [[ObjectOperator]] must also provide the attributes that are available during the resolution of each deserializer."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-174793254
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50054/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50768561
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala ---
    @@ -44,18 +53,30 @@ trait ObjectOperator extends LogicalPlan {
       def withObjectOutput: LogicalPlan = if (output.head.dataType.isInstanceOf[ObjectType]) {
         this
       } else {
    -    withNewSerializer(outputObject)
    +    withNewSerializer(outputObject :: Nil)
       }
     
       /** Returns a copy of this operator with a different serializer. */
    -  def withNewSerializer(newSerializer: NamedExpression): LogicalPlan = makeCopy {
    +  def withNewSerializer(newSerializer: Seq[NamedExpression]): LogicalPlan = makeCopy {
         productIterator.map {
    -      case c if c == serializer => newSerializer :: Nil
    +      case c if c == serializer => newSerializer
           case other: AnyRef => other
         }.toArray
       }
     }
     
    +/**
    + * A dummy [[ObjectOperator]] that is used to trigger deserializer resolution in Analyzer and we
    + * can get back the resolved deserializer later.
    + */
    +case class DummyObjectOperator(deserializer: Expression, attributes: Seq[Attribute])
    +  extends LeafNode with ObjectOperator {
    +
    +  override def inputSet: AttributeSet = AttributeSet(attributes)
    +  override def serializer: Seq[NamedExpression] = Nil
    +  override def deserializers: Seq[(Expression, Seq[Attribute])] = Seq(deserializer -> attributes)
    +}
    --- End diff --
    
    Could we instead wrap the resolution logic in a public function that is defined on `ResolveReferences` and just call that directly when we need to?  The flow is pretty hard to follow here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173407051
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-178936657
  
    **[Test build #50623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50623/consoleFull)** for PR 10852 at commit [`6bad213`](https://github.com/apache/spark/commit/6bad213f1bcfb1bb59013af3eef70618c15d85b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-178966833
  
    **[Test build #50623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50623/consoleFull)** for PR 10852 at commit [`6bad213`](https://github.com/apache/spark/commit/6bad213f1bcfb1bb59013af3eef70618c15d85b7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50774731
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala ---
    @@ -44,18 +53,30 @@ trait ObjectOperator extends LogicalPlan {
       def withObjectOutput: LogicalPlan = if (output.head.dataType.isInstanceOf[ObjectType]) {
         this
       } else {
    -    withNewSerializer(outputObject)
    +    withNewSerializer(outputObject :: Nil)
       }
     
       /** Returns a copy of this operator with a different serializer. */
    -  def withNewSerializer(newSerializer: NamedExpression): LogicalPlan = makeCopy {
    +  def withNewSerializer(newSerializer: Seq[NamedExpression]): LogicalPlan = makeCopy {
    --- End diff --
    
    according to the definition of serializer: `def serializer: Seq[NamedExpression]`, a seq of named expressions work together as a serialize, so `withNewSerializer` seems more accuracy here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10852


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50775945
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -142,12 +143,14 @@ case class MapGroups(
      */
     case class CoGroup(
         func: (Any, Iterator[Any], Iterator[Any]) => TraversableOnce[Any],
    -    keyObject: Expression,
    -    leftObject: Expression,
    -    rightObject: Expression,
    +    keyDeserializer: Expression,
    +    leftDeserializer: Expression,
    +    rightDeserializer: Expression,
         serializer: Seq[NamedExpression],
         leftGroup: Seq[Attribute],
         rightGroup: Seq[Attribute],
    +    leftAttr: Seq[Attribute],
    +    rightAttr: Seq[Attribute],
    --- End diff --
    
    I might be missing something here, but it looks like there is only one place where we create a `CoGroup` operator and it is always using just the left/right output attributes.  Lets not over generalize unless it is useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173433127
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50767909
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
    @@ -142,12 +143,14 @@ case class MapGroups(
      */
     case class CoGroup(
         func: (Any, Iterator[Any], Iterator[Any]) => TraversableOnce[Any],
    -    keyObject: Expression,
    -    leftObject: Expression,
    -    rightObject: Expression,
    +    keyDeserializer: Expression,
    +    leftDeserializer: Expression,
    +    rightDeserializer: Expression,
         serializer: Seq[NamedExpression],
         leftGroup: Seq[Attribute],
         rightGroup: Seq[Attribute],
    +    leftAttr: Seq[Attribute],
    +    rightAttr: Seq[Attribute],
    --- End diff --
    
    Why add these arguments when in practice they are always the same as `left.output`/`right.output`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10852#discussion_r50768962
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -443,25 +444,32 @@ class Analyzer(
           // When resolve `SortOrder`s in Sort based on child, don't report errors as
           // we still have chance to resolve it based on grandchild
           case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
    -        val newOrdering = resolveSortOrders(ordering, child, throws = false)
    +        val newOrdering = ordering.map(order =>
    +          resolveExpression(order, child, throws = false).asInstanceOf[SortOrder])
             Sort(newOrdering, global, child)
     
           // A special case for Generate, because the output of Generate should not be resolved by
           // ResolveReferences. Attributes in the output will be resolved by ResolveGenerate.
           case g @ Generate(generator, join, outer, qualifier, output, child)
             if child.resolved && !generator.resolved =>
    -        val newG = generator transformUp {
    -          case u @ UnresolvedAttribute(nameParts) =>
    -            withPosition(u) { child.resolve(nameParts, resolver).getOrElse(u) }
    -          case UnresolvedExtractValue(child, fieldExpr) =>
    -            ExtractValue(child, fieldExpr, resolver)
    -        }
    +        val newG = resolveExpression(generator, child)
             if (newG.fastEquals(generator)) {
               g
             } else {
               Generate(newG.asInstanceOf[Generator], join, outer, qualifier, output, child)
             }
     
    +      case o: ObjectOperator if containsUnresolvedDeserializer(o.deserializers.map(_._1)) =>
    --- End diff --
    
    Add a comment about why this case is special.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173407053
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49815/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-178769783
  
    I'm sorry... I let this go stale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-178967232
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173415466
  
    **[Test build #49830 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49830/consoleFull)** for PR 10852 at commit [`60c3c85`](https://github.com/apache/spark/commit/60c3c85b37b7ace325e55917e81e8a8b60f8973b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-178776116
  
    oh, didn't notice it. Will update it soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173380744
  
    **[Test build #49815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49815/consoleFull)** for PR 10852 at commit [`f34d602`](https://github.com/apache/spark/commit/f34d60240c255d3b7933aeb5f6747e5f611007ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12939][SQL] migrate encoder resolution ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10852#issuecomment-173433129
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49830/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org