You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/08/06 03:21:20 UTC

[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/1795

    [SPARK-2866][SQL] Support attributes in ORDER BY that aren't in SELECT

    Minor refactoring to allow resolution either using a nodes input or output.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark ordering

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1795
    
----
commit 82cabdada860e959f12229bcd200dd0a46bafcfb
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-05T22:46:19Z

    Generalize attribute resolution.

commit 705d9631f0d5f1942219fb1e3beff7280c0b0dee
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-05T22:46:52Z

    Add a rule for resolving ORDER BY expressions that reference attributes not present in the SELECT clause.

commit 74d833b815191a4f0ec5fd5be723034534538ba0
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-08-06T01:20:14Z

    newline

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1795


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1795#issuecomment-51287156
  
    QA tests have started for PR 1795. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17979/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853322
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -121,6 +122,51 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
       }
     
       /**
    +   * In many dialects of SQL is it valid to sort by attributes that are not present in the SELECT
    +   * clause.  This rule detects such queries and adds the required attributes to the original
    +   * projection, so that they will be available during sorting. Another projection is added to
    +   * remove these attributes after sorting.
    +   */
    +  object ResolveSortReferences extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case s@Sort(ordering, p@Project(projectList, child)) if !s.resolved && p.resolved =>
    --- End diff --
    
    Spaces before and after `@`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1795#issuecomment-51291017
  
    QA results for PR 1795:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17979/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853348
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -121,6 +122,51 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
       }
     
       /**
    +   * In many dialects of SQL is it valid to sort by attributes that are not present in the SELECT
    +   * clause.  This rule detects such queries and adds the required attributes to the original
    +   * projection, so that they will be available during sorting. Another projection is added to
    +   * remove these attributes after sorting.
    +   */
    +  object ResolveSortReferences extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case s@Sort(ordering, p@Project(projectList, child)) if !s.resolved && p.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    +        val resolved = unresolved.flatMap(child.resolveChildren)
    +        val requiredAttributes = resolved.collect { case a: Attribute => a }.toSet
    +
    +        val missingInProject = requiredAttributes -- p.output
    +        if (missingInProject.nonEmpty) {
    +          // Add missing attributes and then project them away after the sort.
    +          Project(projectList,
    +            Sort(ordering,
    +              Project(projectList ++ missingInProject, child)))
    +        } else {
    +          s // Nothing we can do here. Return original plan.
    +        }
    +      case s@Sort(ordering, a@Aggregate(grouping, aggs, child)) if !s.resolved && a.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    --- End diff --
    
    Space before `}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1795#issuecomment-51291104
  
    Thanks for the comments.  I've merged this into master and branch 1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1795#issuecomment-51283333
  
    QA tests have started for PR 1795. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17967/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853338
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -121,6 +122,51 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
       }
     
       /**
    +   * In many dialects of SQL is it valid to sort by attributes that are not present in the SELECT
    +   * clause.  This rule detects such queries and adds the required attributes to the original
    +   * projection, so that they will be available during sorting. Another projection is added to
    +   * remove these attributes after sorting.
    +   */
    +  object ResolveSortReferences extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case s@Sort(ordering, p@Project(projectList, child)) if !s.resolved && p.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    +        val resolved = unresolved.flatMap(child.resolveChildren)
    +        val requiredAttributes = resolved.collect { case a: Attribute => a }.toSet
    +
    +        val missingInProject = requiredAttributes -- p.output
    +        if (missingInProject.nonEmpty) {
    +          // Add missing attributes and then project them away after the sort.
    +          Project(projectList,
    +            Sort(ordering,
    +              Project(projectList ++ missingInProject, child)))
    +        } else {
    +          s // Nothing we can do here. Return original plan.
    +        }
    +      case s@Sort(ordering, a@Aggregate(grouping, aggs, child)) if !s.resolved && a.resolved =>
    --- End diff --
    
    Spaces before and after `@`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853457
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -89,15 +102,15 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
         }
     
         options.distinct match {
    -      case (a, Nil) :: Nil => Some(a) // One match, no nested fields, use it.
    +      case Seq((a, Nil)) => Some(a) // One match, no nested fields, use it.
           // One match, but we also need to extract the requested nested field.
    -      case (a, nestedFields) :: Nil =>
    +      case Seq((a, nestedFields)) =>
             a.dataType match {
               case StructType(fields) =>
                 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)())
               case _ => None // Don't know how to resolve these field references
             }
    -      case Nil => None         // No matches.
    +      case Seq() => None         // No matches.
    --- End diff --
    
    No, Nil will only match things of type `List`, where as Seq is more general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853356
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -121,6 +122,51 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
       }
     
       /**
    +   * In many dialects of SQL is it valid to sort by attributes that are not present in the SELECT
    +   * clause.  This rule detects such queries and adds the required attributes to the original
    +   * projection, so that they will be available during sorting. Another projection is added to
    +   * remove these attributes after sorting.
    +   */
    +  object ResolveSortReferences extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case s@Sort(ordering, p@Project(projectList, child)) if !s.resolved && p.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    +        val resolved = unresolved.flatMap(child.resolveChildren)
    +        val requiredAttributes = resolved.collect { case a: Attribute => a }.toSet
    +
    +        val missingInProject = requiredAttributes -- p.output
    +        if (missingInProject.nonEmpty) {
    +          // Add missing attributes and then project them away after the sort.
    +          Project(projectList,
    +            Sort(ordering,
    +              Project(projectList ++ missingInProject, child)))
    +        } else {
    +          s // Nothing we can do here. Return original plan.
    +        }
    +      case s@Sort(ordering, a@Aggregate(grouping, aggs, child)) if !s.resolved && a.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    +        // A small hack to create an object that will allow us to resolve any references that
    +        // refer to named expressions that are present in the grouping expressions.
    +        val groupingRelation = LocalRelation(
    +          grouping.collect { case ne: NamedExpression => ne.toAttribute}
    +        )
    +
    +        logWarning(s"Grouping expressions: $groupingRelation")
    +        val resolved = unresolved.flatMap(groupingRelation.resolve).toSet
    +        val missingInAggs = resolved -- a.outputSet
    +        logWarning(s"Resolved: $resolved Missing in aggs: $missingInAggs")
    +        if(missingInAggs.nonEmpty) {
    --- End diff --
    
    Space after `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853329
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -121,6 +122,51 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
       }
     
       /**
    +   * In many dialects of SQL is it valid to sort by attributes that are not present in the SELECT
    +   * clause.  This rule detects such queries and adds the required attributes to the original
    +   * projection, so that they will be available during sorting. Another projection is added to
    +   * remove these attributes after sorting.
    +   */
    +  object ResolveSortReferences extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case s@Sort(ordering, p@Project(projectList, child)) if !s.resolved && p.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    --- End diff --
    
    Space before `}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853395
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -89,15 +102,15 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] {
         }
     
         options.distinct match {
    -      case (a, Nil) :: Nil => Some(a) // One match, no nested fields, use it.
    +      case Seq((a, Nil)) => Some(a) // One match, no nested fields, use it.
           // One match, but we also need to extract the requested nested field.
    -      case (a, nestedFields) :: Nil =>
    +      case Seq((a, nestedFields)) =>
             a.dataType match {
               case StructType(fields) =>
                 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)())
               case _ => None // Don't know how to resolve these field references
             }
    -      case Nil => None         // No matches.
    +      case Seq() => None         // No matches.
    --- End diff --
    
    Is the same thing for `Nil` and `Seq()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1795#discussion_r15853350
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -121,6 +122,51 @@ class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Bool
       }
     
       /**
    +   * In many dialects of SQL is it valid to sort by attributes that are not present in the SELECT
    +   * clause.  This rule detects such queries and adds the required attributes to the original
    +   * projection, so that they will be available during sorting. Another projection is added to
    +   * remove these attributes after sorting.
    +   */
    +  object ResolveSortReferences extends Rule[LogicalPlan] {
    +    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    +      case s@Sort(ordering, p@Project(projectList, child)) if !s.resolved && p.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    +        val resolved = unresolved.flatMap(child.resolveChildren)
    +        val requiredAttributes = resolved.collect { case a: Attribute => a }.toSet
    +
    +        val missingInProject = requiredAttributes -- p.output
    +        if (missingInProject.nonEmpty) {
    +          // Add missing attributes and then project them away after the sort.
    +          Project(projectList,
    +            Sort(ordering,
    +              Project(projectList ++ missingInProject, child)))
    +        } else {
    +          s // Nothing we can do here. Return original plan.
    +        }
    +      case s@Sort(ordering, a@Aggregate(grouping, aggs, child)) if !s.resolved && a.resolved =>
    +        val unresolved = ordering.flatMap(_.collect { case UnresolvedAttribute(name) => name})
    +        // A small hack to create an object that will allow us to resolve any references that
    +        // refer to named expressions that are present in the grouping expressions.
    +        val groupingRelation = LocalRelation(
    +          grouping.collect { case ne: NamedExpression => ne.toAttribute}
    --- End diff --
    
    Space before `}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2866][SQL] Support attributes in ORDER ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1795#issuecomment-51288059
  
    QA results for PR 1795:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17967/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org