You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2016/05/05 07:45:00 UTC

[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/12926

    [SPARK-15094][SPARK-14803][SQL] Add ObjectProject for EliminateSerialization

    ## What changes were proposed in this pull request?
    
    We will eliminate the pair of `DeserializeToObject` and `SerializeFromObject` in `Optimizer` and add extra `Project`. However, when DeserializeToObject's outputObjectType is ObjectType and its cls can't be processed by unsafe project, it will be failed.
    
    To fix it, we can simply add a plan to project object that can preserve `DeserializeToObject`'s output expr id as the extra `Project` did.
    
    ## How was this patch tested?
    `DatasetSuite`, `EliminateSerializationSuite`.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 fix-eliminate-serialization-projection

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12926
    
----
commit cf53a434b893293041f73414f50d7f0918a01d49
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-05-04T09:49:27Z

    Avoid extra Project when DeserializeToObject outputs an unsupported class for Project.

commit 48e6b6d3bc4d41d808db43b888e6b17a17a77d1f
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-05-05T07:38:06Z

    Add ObjectProject.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218660355
  
    test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62857441
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,67 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    --- End diff --
    
    nit: `isAliasOnly`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217113083
  
    **[Test build #57873 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57873/consoleFull)** for PR 12926 at commit [`737c518`](https://github.com/apache/spark/commit/737c5187f7e9db1a08407246416cbc967fec7d30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218649791
  
    **[Test build #58440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58440/consoleFull)** for PR 12926 at commit [`c3748ba`](https://github.com/apache/spark/commit/c3748bac348e30dc87cb41fcdb3ae9086acec66f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12926


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218670776
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217347901
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57955/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62971980
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -659,6 +659,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkDataset(DatasetTransform.addOne(dataset), 2, 3, 4)
       }
     
    +  test("dataset.rdd with generic case class") {
    --- End diff --
    
    what is this test for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62947348
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,67 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val processedPlan = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if checkAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }.map { case p: Project =>
    +      val attrMap = p.projectList.map { a =>
    +        val alias = a.asInstanceOf[Alias]
    +        val replaceFrom = alias.toAttribute.exprId
    +        val replaceTo = alias.child.asInstanceOf[Attribute]
    +        (replaceFrom, replaceTo)
    +      }.toMap
    +      plan.transformAllExpressions {
    +        case a: Attribute if attrMap.contains(a.exprId) => attrMap(a.exprId)
    +      }.transform {
    +        case op: Project if op == p => op.child
    +      }
    +    }
    +    if (processedPlan.isDefined) {
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217363557
  
    I think a simpler and better approach is just removing this alias only project in next batch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218477575
  
    mostly LGTM except some style comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218658145
  
    **[Test build #58440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58440/consoleFull)** for PR 12926 at commit [`c3748ba`](https://github.com/apache/spark/commit/c3748bac348e30dc87cb41fcdb3ae9086acec66f).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62995004
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -659,6 +659,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkDataset(DatasetTransform.addOne(dataset), 2, 3, 4)
       }
     
    +  test("dataset.rdd with generic case class") {
    +    val ds = Seq(Generic(1, 1.0), Generic(2, 2.0)).toDS
    +    val ds2 = ds.map(g => Generic(g.id, g.value))
    +    ds.rdd.map(r => r.id).count
    --- End diff --
    
    Because it failed on operations of `rdd`. I will check counting results instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217124820
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218660446
  
    **[Test build #58447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58447/consoleFull)** for PR 12926 at commit [`c3748ba`](https://github.com/apache/spark/commit/c3748bac348e30dc87cb41fcdb3ae9086acec66f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217106222
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62858466
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,67 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val processedPlan = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if checkAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }.map { case p: Project =>
    +      val attrMap = p.projectList.map { a =>
    +        val alias = a.asInstanceOf[Alias]
    +        val replaceFrom = alias.toAttribute.exprId
    +        val replaceTo = alias.child.asInstanceOf[Attribute]
    +        (replaceFrom, replaceTo)
    +      }.toMap
    +      plan.transformAllExpressions {
    +        case a: Attribute if attrMap.contains(a.exprId) => attrMap(a.exprId)
    +      }.transform {
    +        case op: Project if op == p => op.child
    --- End diff --
    
    use `eq` to compare reference, which is safer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218658214
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217347900
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218416795
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58353/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62935527
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,67 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val processedPlan = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if checkAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }.map { case p: Project =>
    +      val attrMap = p.projectList.map { a =>
    +        val alias = a.asInstanceOf[Alias]
    +        val replaceFrom = alias.toAttribute.exprId
    +        val replaceTo = alias.child.asInstanceOf[Attribute]
    +        (replaceFrom, replaceTo)
    +      }.toMap
    +      plan.transformAllExpressions {
    +        case a: Attribute if attrMap.contains(a.exprId) => attrMap(a.exprId)
    +      }.transform {
    +        case op: Project if op == p => op.child
    +      }
    +    }
    +    if (processedPlan.isDefined) {
    --- End diff --
    
    nit: Why not `processedPlan.getOrElse(plan)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218392575
  
    **[Test build #58351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58351/consoleFull)** for PR 12926 at commit [`85fba17`](https://github.com/apache/spark/commit/85fba173b871a1c8bc24f1a781ddbf59f77db645).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218744033
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217688312
  
    @cloud-fan ok. let me try it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218412003
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58351/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62858747
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,67 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val processedPlan = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if checkAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }.map { case p: Project =>
    +      val attrMap = p.projectList.map { a =>
    +        val alias = a.asInstanceOf[Alias]
    +        val replaceFrom = alias.toAttribute.exprId
    +        val replaceTo = alias.child.asInstanceOf[Attribute]
    +        (replaceFrom, replaceTo)
    +      }.toMap
    +      plan.transformAllExpressions {
    +        case a: Attribute if attrMap.contains(a.exprId) => attrMap(a.exprId)
    +      }.transform {
    +        case op: Project if op == p => op.child
    +      }
    +    }
    +    if (processedPlan.isDefined) {
    --- End diff --
    
    code style suggestion:
    ```
    val aliasOnlyProject = ...
    if (aliasOnlyProject.isDefined) {
        val p = aliasOnlyProject.get.asInstanceOf[Project]
        ...
    } else {
      plan
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218731654
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58474/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217791576
  
    **[Test build #58127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58127/consoleFull)** for PR 12926 at commit [`4b0773a`](https://github.com/apache/spark/commit/4b0773adcad8b6d6f0be6f5ca2287fb877e502c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218822942
  
    Thanks. Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217807423
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218670777
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58447/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217347818
  
    **[Test build #57955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57955/consoleFull)** for PR 12926 at commit [`3d0554d`](https://github.com/apache/spark/commit/3d0554d3ba92d2c65202aef530d1ecdfbce11657).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217378090
  
    Sorry, when I say "remove", I mean a safe removal that we transform the plan tree and replace attributes produced by alias with the original attributes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217339830
  
    **[Test build #57955 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57955/consoleFull)** for PR 12926 at commit [`3d0554d`](https://github.com/apache/spark/commit/3d0554d3ba92d2c65202aef530d1ecdfbce11657).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218607549
  
    Looks pretty good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218649870
  
    @cloud-fan @zsxwing Thanks! I've addressed your comments now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217807426
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58127/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217357870
  
    cc @cloud-fan @yhuai @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218670633
  
    **[Test build #58447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58447/consoleFull)** for PR 12926 at commit [`c3748ba`](https://github.com/apache/spark/commit/c3748bac348e30dc87cb41fcdb3ae9086acec66f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62972085
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -659,6 +659,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkDataset(DatasetTransform.addOne(dataset), 2, 3, 4)
       }
     
    +  test("dataset.rdd with generic case class") {
    +    val ds = Seq(Generic(1, 1.0), Generic(2, 2.0)).toDS
    +    val ds2 = ds.map(g => Generic(g.id, g.value))
    +    ds.rdd.map(r => r.id).count
    --- End diff --
    
    it's better to use `checkDataset` to check the answer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217106224
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57858/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218411736
  
    **[Test build #58351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58351/consoleFull)** for PR 12926 at commit [`85fba17`](https://github.com/apache/spark/commit/85fba173b871a1c8bc24f1a781ddbf59f77db645).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217375374
  
    @cloud-fan But how about to preserve `DeserializeToObject` 's output expr id?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62973641
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,60 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def isAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val aliasOnlyProject = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if isAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }
    +
    +    aliasOnlyProject.map { case p: Project =>
    +      val aliases = p.projectList.map(_.asInstanceOf[Alias])
    +      val attrMap = AttributeMap(aliases.map(a => (a.toAttribute, a.child)))
    +      plan.transformAllExpressions {
    +        case a: Attribute if attrMap.contains(a) => attrMap(a)
    +      }.transform {
    +        case op: Project if op.eq(p) => op.child
    +      }
    +    }.getOrElse(plan)
    +  }
    +}
    +
    +/**
      * Removes cases where we are unnecessarily going between the object and serialized (InternalRow)
      * representation of data item.  For example back to back map operations.
      */
     object EliminateSerialization extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case d @ DeserializeToObject(_, _, s: SerializeFromObject)
             if d.outputObjectType == s.inputObjectType =>
    -      // A workaround for SPARK-14803. Remove this after it is fixed.
    -      if (d.outputObjectType.isInstanceOf[ObjectType] &&
    -          d.outputObjectType.asInstanceOf[ObjectType].cls == classOf[org.apache.spark.sql.Row]) {
    -        s.child
    -      } else {
    -        // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`.
    -        val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId)
    -        Project(objAttr :: Nil, s.child)
    -      }
    +      // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`.
    +      // We will remove it later in RemoveAliasOnlyProject rule.
    +      val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId)
    --- End diff --
    
    ok. update later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62808018
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,70 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute
    --- End diff --
    
    isn't it just `a semantic equals attr`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218731651
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62973597
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -659,6 +659,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
         checkDataset(DatasetTransform.addOne(dataset), 2, 3, 4)
       }
     
    +  test("dataset.rdd with generic case class") {
    --- End diff --
    
    The codes cause problem in jira SPARK-15094.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218396830
  
    **[Test build #58353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58353/consoleFull)** for PR 12926 at commit [`ea55398`](https://github.com/apache/spark/commit/ea553983006f744e3c6563f9e04139ad1371f65e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218416793
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62858354
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,67 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def checkAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val processedPlan = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if checkAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }.map { case p: Project =>
    +      val attrMap = p.projectList.map { a =>
    --- End diff --
    
    We can use `AttributeMap` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218364032
  
    **[Test build #58326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58326/consoleFull)** for PR 12926 at commit [`29a0c70`](https://github.com/apache/spark/commit/29a0c70488fc8d3f7157679d8f41f7ceb5af9bc4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218416561
  
    **[Test build #58353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58353/consoleFull)** for PR 12926 at commit [`ea55398`](https://github.com/apache/spark/commit/ea553983006f744e3c6563f9e04139ad1371f65e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217106127
  
    **[Test build #57858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57858/consoleFull)** for PR 12926 at commit [`48e6b6d`](https://github.com/apache/spark/commit/48e6b6d3bc4d41d808db43b888e6b17a17a77d1f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ObjectProject(`
      * `case class ObjectProject(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217095508
  
    **[Test build #57858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57858/consoleFull)** for PR 12926 at commit [`48e6b6d`](https://github.com/apache/spark/commit/48e6b6d3bc4d41d808db43b888e6b17a17a77d1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62460198
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,38 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes extra Project added in EliminateSerialization rule.
    + */
    +object RemoveExtraProjectForSerialization extends Rule[LogicalPlan] {
    --- End diff --
    
    How about we make it more general? e.g. `RemoveAliasOnlyProject`, so that not only object operator can benefit from it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218731449
  
    **[Test build #58474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58474/consoleFull)** for PR 12926 at commit [`882fc66`](https://github.com/apache/spark/commit/882fc666c1efb2d8313d5f3b944b779651045d59).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217095256
  
    This is another approach to #12898.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217124821
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57873/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218412000
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62971440
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,60 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def isAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    --- End diff --
    
    why do we need `a.name == attr.name`? I think even if the alias name is different from attribute name, we can still remove it as long as this `Project` is not the root node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218371339
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58326/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218658216
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58440/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217124674
  
    **[Test build #57873 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57873/consoleFull)** for PR 12926 at commit [`737c518`](https://github.com/apache/spark/commit/737c5187f7e9db1a08407246416cbc967fec7d30).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218371230
  
    **[Test build #58326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58326/consoleFull)** for PR 12926 at commit [`29a0c70`](https://github.com/apache/spark/commit/29a0c70488fc8d3f7157679d8f41f7ceb5af9bc4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12926#discussion_r62971803
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -156,22 +157,60 @@ object SamplePushDown extends Rule[LogicalPlan] {
     }
     
     /**
    + * Removes the Project only conducting Alias of its child node.
    + * It is created mainly for removing extra Project added in EliminateSerialization rule,
    + * but can also benefit other operators.
    + */
    +object RemoveAliasOnlyProject extends Rule[LogicalPlan] {
    +  // Check if projectList in the Project node has the same attribute names and ordering
    +  // as its child node.
    +  private def isAliasOnly(
    +      projectList: Seq[NamedExpression],
    +      childOutput: Seq[Attribute]): Boolean = {
    +    if (!projectList.forall(_.isInstanceOf[Alias]) || projectList.length != childOutput.length) {
    +      return false
    +    } else {
    +      projectList.map(_.asInstanceOf[Alias]).zip(childOutput).forall { case (a, o) =>
    +        a.child match {
    +          case attr: Attribute if a.name == attr.name && attr.semanticEquals(o) => true
    +          case _ => false
    +        }
    +      }
    +    }
    +  }
    +
    +  def apply(plan: LogicalPlan): LogicalPlan = {
    +    val aliasOnlyProject = plan.find { p =>
    +      p match {
    +        case Project(pList, child) if isAliasOnly(pList, child.output) => true
    +        case _ => false
    +      }
    +    }
    +
    +    aliasOnlyProject.map { case p: Project =>
    +      val aliases = p.projectList.map(_.asInstanceOf[Alias])
    +      val attrMap = AttributeMap(aliases.map(a => (a.toAttribute, a.child)))
    +      plan.transformAllExpressions {
    +        case a: Attribute if attrMap.contains(a) => attrMap(a)
    +      }.transform {
    +        case op: Project if op.eq(p) => op.child
    +      }
    +    }.getOrElse(plan)
    +  }
    +}
    +
    +/**
      * Removes cases where we are unnecessarily going between the object and serialized (InternalRow)
      * representation of data item.  For example back to back map operations.
      */
     object EliminateSerialization extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case d @ DeserializeToObject(_, _, s: SerializeFromObject)
             if d.outputObjectType == s.inputObjectType =>
    -      // A workaround for SPARK-14803. Remove this after it is fixed.
    -      if (d.outputObjectType.isInstanceOf[ObjectType] &&
    -          d.outputObjectType.asInstanceOf[ObjectType].cls == classOf[org.apache.spark.sql.Row]) {
    -        s.child
    -      } else {
    -        // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`.
    -        val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId)
    -        Project(objAttr :: Nil, s.child)
    -      }
    +      // Adds an extra Project here, to preserve the output expr id of `DeserializeToObject`.
    +      // We will remove it later in RemoveAliasOnlyProject rule.
    +      val objAttr = Alias(s.child.output.head, "obj")(exprId = d.output.head.exprId)
    --- End diff --
    
    nit: use `Alias(s.child.output.head, s.child.output.head.name)(exprId = d.output.head.exprId)` to make sure the alias name is same with the attribute name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Remove extra P...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218714670
  
    **[Test build #58474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58474/consoleFull)** for PR 12926 at commit [`882fc66`](https://github.com/apache/spark/commit/882fc666c1efb2d8313d5f3b944b779651045d59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-218371338
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15094][SPARK-14803][SQL] Add ObjectProj...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12926#issuecomment-217807249
  
    **[Test build #58127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58127/consoleFull)** for PR 12926 at commit [`4b0773a`](https://github.com/apache/spark/commit/4b0773adcad8b6d6f0be6f5ca2287fb877e502c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org