You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kevinyu98 <gi...@git.apache.org> on 2016/01/28 08:50:38 UTC

[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...

GitHub user kevinyu98 opened a pull request:

    https://github.com/apache/spark/pull/10967

    [SPARK-10777] [SQL] Resolve Aliases in the Group By clause 

    @gatorsmile @yhuai @marmbrus @cloud-fan : Hello All, I tried to run the failing query with PR 10678 from Spark-12705, still got the same failure. 
    
    Actually for this jira problem, I can recreate it without using order by and window function. It just needs select a column with aliases and aggregate function , group by with the aliases. 
    
    the query looks like below:
    
    select a  r, sum(b) s FROM testData2 GROUP BY r
    
    (if I replace r in the group by with a, it will work)
    
    I think this jira is different than Xiao's jira. 
    
    For this Jira, it looks like the Aliases  in the Group By clause (r)  can't be resolved in the rule ResolveReferences. 
    
    Currently, the ResolveReferences only deal with the aggregate function if the argument contains Stars, so for other aggregate function, it falls into this case: case q: LogicalPlan , and it will try to resolve it in the child. In this case, the group by contains alias r, the child is LogicalRDD contains column a and b, that is why we can't find r in the child.
    
    Here is the plan looks like.
    
    plan = {Aggregate@9173} "'Aggregate ['r], [a#4 AS r#43,(sum(cast(b#5 as bigint)),mode=Complete,isDistinct=false) AS s#44L]\n+- Subquery testData2\n   +- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
     groupingExpressions = {$colon$colon@9176} "::" size = 1
      (0)  = {UnresolvedAttribute@9190} "'r"
     aggregateExpressions = {$colon$colon@9177} "::" size = 2
      (0)  = {Alias@9110} "a#4 AS r#43"
      (1)  = {Alias@9196} "(sum(cast(b#5 as bigint)),mode=Complete,isDistinct=false) AS s#44L"
     child = {Subquery@7456} "Subquery testData2\n+- LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
      alias = {String@9201} "testData2"
      child = {LogicalRDD@9202} "LogicalRDD [a#4,b#5], MapPartitionsRDD[5] at beforeAll at BeforeAndAfterAll.scala:187\n"
      _analyzed = false
      resolved = true
      cleanArgs = null
      org$apache$spark$Logging$$log_ = null
      bitmap$0 = 1
      schema = null
      bitmap$0 = false
      origin = {Origin@9203} "Origin(Some(1),Some(27))"
      containsChild = {Set$Set1@9204} "Set$Set1" size = 1
      bitmap$0 = true
     resolved = false
     bitmap$0 = true
     _analyzed = false
     resolved = false
    
    the proposal fix is that we create another case for aggregate function, if there is unresolved attribute in the groupingExpressions, and all the attributes are resolved in the aggregateExpressions, we will search the unresolved attribute in the aggregateExpressions first. 
    
    Thanks for reviewing. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinyu98/spark working_on_spark-10777

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10967.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10967
    
----
commit c2fcaa8e488d12419c7b7c5032ccadab38f20b68
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-10T03:21:14Z

    window function: Sorting columns are not in Project

commit 5ca463035bc6eaebd15e7cf332faeea157e5593e
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-10T03:30:58Z

    style fix.

commit da6baf25488767ce6e73538b03f9195bba92b84e
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-10T06:23:48Z

    code cleaning and address comments.

commit b5de0799650a86b8479eb053d7e3e65b23e5d75b
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-10T16:31:09Z

    Merge remote-tracking branch 'upstream/master' into sortWindows

commit d164342747502b09686c1802cf9d24d8ed4c899e
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-13T06:15:31Z

    address comments.

commit 27fcaa5ad6a3b4228ef4fc46b963c1e818d2f5c4
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-13T08:30:12Z

    address comments.

commit 7fc98e49a26fd03f398b2241b4cfd19e969b770e
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-17T05:03:23Z

    added a support to more operators.

commit 03112397437cf0f49eea8a347383d9d642e0995b
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-17T05:24:14Z

    Merge remote-tracking branch 'upstream/master' into sortWindows

commit 522626bbd483054f441d2ca49bc06512901258ea
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-17T05:25:56Z

    style fix.

commit 26945fa63809a8671461404eb2e661e1605dc196
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-17T07:14:38Z

    fixed the test case that might fail sometimes due to the sorted values are duplicate

commit bd3ed13b9e78d59274cda6c243acc5e704bb2821
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-17T18:21:54Z

    added test cases

commit 831baf515faae0f12fae0f8b50297c05292e9e16
Author: gatorsmile <ga...@gmail.com>
Date:   2016-01-18T07:44:27Z

    fixed bugs.

commit e2db989f15a8bc2465b8476c211261fb385d201d
Author: Kevin Yu <qy...@us.ibm.com>
Date:   2016-01-28T06:22:46Z

    resolve the UnresolvedAttribute for aliases in GROUP By clause

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10967#issuecomment-176239256
  
    This is a separate issue. It happens when the alias defined in aggregation expression is used in the group by. Thus, you do not need to merge my fix, which is still being changed for addressing the comments. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10967


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10967#issuecomment-176040823
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-10777] [SQL] Resolve Aliases in the Gro...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10967#issuecomment-176439624
  
    I'm not sure we want this.  Neither oracle nor SQL server support it and you can already use numbers to refer to things from the select clause in a group by.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org