You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhichao-li <gi...@git.apache.org> on 2016/01/13 01:48:57 UTC
[GitHub] spark pull request: [SPARK-12789]Support order by index
GitHub user zhichao-li opened a pull request:
https://github.com/apache/spark/pull/10731
[SPARK-12789]Support order by index
Num in Order by is treated as constant expression at the moment. I guess it would be good to enable user to specify column by index which has been supported in Hive 0.11.0 and later. Hive use hive.groupby.orderby.position.alias to turn on/off this feature, not sure if we need to follow this as well.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhichao-li/spark orderby
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10731.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10731
----
commit 5a2270be260c635772fb4d46a7084eecf7166d87
Author: zhichao.li <zh...@intel.com>
Date: 2016-01-12T08:03:26Z
support order by index
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171113667
cc @chenghao-intel @adrian-wang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49688289
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -441,7 +442,7 @@ class Analyzer(
// When resolve `SortOrder`s in Sort based on child, don't report errors as
// we still have chance to resolve it based on grandchild
- case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
--- End diff --
But if we have `order by a` and after resolve the ordering column, we will still hit this case, which is not what we want, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172488215
This is similar to: https://github.com/apache/spark/pull/10052
That PR also implements this idea for ```GROUP BY``` clauses.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172458292
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171249634
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49311/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56454169
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
@@ -202,3 +203,14 @@ object Unions {
}
}
}
+
+/**
+ * Extractor for retrieving Int value.
+ */
+object IntegerIndex {
+ def unapply(a: Any): Option[Int] = a match {
+ case Literal(a: Int, IntegerType) => Some(a)
+ case UnaryMinus(IntegerLiteral(v)) => Some(-v)
--- End diff --
is it standard to support -(-1)? I see postgres support it, but somewhat strange to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56454789
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -497,6 +498,41 @@ class Analyzer(
ordering.map(order => resolveExpression(order, child).asInstanceOf[SortOrder])
Sort(newOrdering, global, child)
+ // Replace the index with the related attribute for ORDER BY
+ // which is a 1-base position of the projection list.
+ case s @ Sort(orders, global, child) if child.resolved &&
--- End diff --
this rule is getting pretty long -- i wonder if there are ways to break it down
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r52096762
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
--- End diff --
it indeed similar from the code structure and name, but internally it wants to construct two different things. And also the first parameter of `UnresolvedException` require concrete type, so it cannot be a type like TreeNode[_] to unify the api. How about I rename those to be `indexToSortOrder` and `indexToAggregate` ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r52093518
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -456,21 +455,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
test("literal in agg grouping expressions") {
checkAnswer(
- sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
- Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
+ sql("SELECT SUM(a) FROM testData2 GROUP BY 2"),
+ Seq(Row(6), Row(6)))
+
checkAnswer(
- sql("SELECT a, count(2) FROM testData2 GROUP BY a, 2"),
+ sql("SELECT a, SUM(b) FROM testData2 GROUP BY 1"),
+ Seq(Row(1, 3), Row(2, 3), Row(3, 3)))
+
+ checkAnswer(
+ sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
checkAnswer(
sql("SELECT a, 1, sum(b) FROM testData2 GROUP BY a, 1"),
--- End diff --
it would throw exception like "Invalid call to Aggregate by position: 2 does not exist", let me add that as an unit-test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173493053
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-185115655
**[Test build #51419 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51419/consoleFull)** for PR 10731 at commit [`66c54b1`](https://github.com/apache/spark/commit/66c54b14f8e57d181bf78f293fe0de0d899cb2ba).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-197698035
Also I'd say "by position", not "by index", since index usually refers to something else in databases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173444293
**[Test build #49844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49844/consoleFull)** for PR 10731 at commit [`e61429f`](https://github.com/apache/spark/commit/e61429fec35c0f0983ff5e1bfeea11a1cef42690).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171538142
**[Test build #49383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49383/consoleFull)** for PR 10731 at commit [`fe99e00`](https://github.com/apache/spark/commit/fe99e0043598e6af37c691aeb00d1204a37d9158).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49539497
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
import org.apache.spark.AccumulatorSuite
import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry}
--- End diff --
import order
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-180249809
Sure, would provide more description and comments later.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49962166
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -438,11 +438,25 @@ class Analyzer(
}
j.copy(right = newRight)
}
-
// When resolve `SortOrder`s in Sort based on child, don't report errors as
// we still have chance to resolve it based on grandchild
- case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
- val newOrdering = resolveSortOrders(ordering, child, throws = false)
+ case s @ Sort(ordering, global, child) if child.resolved =>
--- End diff --
Would update that shortly. I was thinking it would be more efficient by combining those in one past.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172423347
a quick question. If I do `ORDER BY a, 2, b, c`, `2` means the third column?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172448588
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171249633
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171175160
**[Test build #49292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49292/consoleFull)** for PR 10731 at commit [`b72547b`](https://github.com/apache/spark/commit/b72547b04923862579935344f6cd9af1814b77f5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51985561
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
+ agg: Aggregate,
+ child: LogicalPlan,
+ index: Int): Expression = {
+ val attributes = child.output
+ if (index > 0 && index <= attributes.size) {
+ attributes(index - 1)
+ } else {
+ throw new UnresolvedException(agg,
+ s"""Aggregate by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${attributes.size}""".stripMargin)
+ }
+ }
+
+ def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+ case s @ Sort(orders, global, child) if child.resolved =>
--- End diff --
Have thought of this for a while, but still cannot come up with a concrete solution for such nested type. Not sure if you can provide any snippet as a suggestion?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51984528
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
+ agg: Aggregate,
+ child: LogicalPlan,
+ index: Int): Expression = {
+ val attributes = child.output
+ if (index > 0 && index <= attributes.size) {
+ attributes(index - 1)
+ } else {
+ throw new UnresolvedException(agg,
+ s"""Aggregate by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${attributes.size}""".stripMargin)
+ }
+ }
+
+ def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+ case s @ Sort(orders, global, child) if child.resolved =>
--- End diff --
Can we use a narrower condition on when this case will be triggered?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172750914
**[Test build #49663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49663/consoleFull)** for PR 10731 at commit [`0daa766`](https://github.com/apache/spark/commit/0daa766d03538c806175c3389106a83b0fe977e3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171486258
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-174397390
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49972/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r52092731
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
+ agg: Aggregate,
+ child: LogicalPlan,
+ index: Int): Expression = {
+ val attributes = child.output
+ if (index > 0 && index <= attributes.size) {
+ attributes(index - 1)
+ } else {
+ throw new UnresolvedException(agg,
+ s"""Aggregate by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${attributes.size}""".stripMargin)
+ }
+ }
+
+ def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+ case s @ Sort(orders, global, child) if child.resolved =>
--- End diff --
That's exactly what I trying to achieve. The thing is `expression` is hided within Seq[SortOrder], somehow we need to iterate `order` to do the checking and the conversion if match. Cann't see the chance to compact them into one line condition, and there's no performance difference in this way.
``` scala
case class Sort(
order: Seq[SortOrder],
global: Boolean,
child: LogicalPlan)
case class SortOrder(child: Expression, direction: SortDirection)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-197699013
@gatorsmile do you think you can take over this and create two prs based on this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198009418
One question, if I have
```
table1: a: int, b: int, c: int
SELECT b, c FROM table1 ORDER BY 1, 2
```
what are columns used in ORDER BY? I guess `b, c`, right?
For GROUP BY, I feel it is not always obvious what are columns referred by the positions (I mean not as obvious as ORDER BY). What do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49539463
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
import org.apache.spark.AccumulatorSuite
import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry}
import org.apache.spark.sql.catalyst.errors.DialectException
+import org.apache.spark.sql.catalyst.expressions.SortOrder
--- End diff --
unused?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-174397389
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-174415158
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51989906
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -456,21 +455,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
test("literal in agg grouping expressions") {
checkAnswer(
- sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
- Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
+ sql("SELECT SUM(a) FROM testData2 GROUP BY 2"),
+ Seq(Row(6), Row(6)))
+
checkAnswer(
- sql("SELECT a, count(2) FROM testData2 GROUP BY a, 2"),
+ sql("SELECT a, SUM(b) FROM testData2 GROUP BY 1"),
+ Seq(Row(1, 3), Row(2, 3), Row(3, 3)))
+
+ checkAnswer(
+ sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
checkAnswer(
sql("SELECT a, 1, sum(b) FROM testData2 GROUP BY a, 1"),
--- End diff --
If we have a table with a single column, what will happen if we call `GROUP BY 1, 2`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172424340
order by 2 should be the second column, I think
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173444379
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49540626
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -441,7 +442,7 @@ class Analyzer(
// When resolve `SortOrder`s in Sort based on child, don't report errors as
// we still have chance to resolve it based on grandchild
- case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
--- End diff --
Literal is always resolved, so remove this to ensure `Order by 1` can be matched by this case checking.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51984456
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
+ agg: Aggregate,
+ child: LogicalPlan,
+ index: Int): Expression = {
+ val attributes = child.output
+ if (index > 0 && index <= attributes.size) {
+ attributes(index - 1)
+ } else {
+ throw new UnresolvedException(agg,
+ s"""Aggregate by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${attributes.size}""".stripMargin)
+ }
+ }
+
+ def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+ case s @ Sort(orders, global, child) if child.resolved =>
+ val newOrders = orders map {
+ case s @ SortOrder(IntegerLiteral(index), direction) =>
+ indexToColumn(s, child, index, direction)
+ case s @ SortOrder(UnaryMinus(IntegerLiteral(index)), direction) =>
+ indexToColumn(s, child, -index, direction)
+ case other => other
--- End diff --
Please add comments to explain the semantic.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171489576
**[Test build #49359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49359/consoleFull)** for PR 10731 at commit [`1cb6752`](https://github.com/apache/spark/commit/1cb675262ed3b90afdff5ffd8b1bfa4965e37df4).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198034957
select * with group by is definitely not valid (with or without position)
select * with order by should work, since * here is just an expansion.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171118500
**[Test build #49276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49276/consoleFull)** for PR 10731 at commit [`5a2270b`](https://github.com/apache/spark/commit/5a2270be260c635772fb4d46a7084eecf7166d87).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-178273065
cc @liancheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-174415163
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49973/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171118509
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49540540
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
import org.apache.spark.AccumulatorSuite
import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry}
import org.apache.spark.sql.catalyst.errors.DialectException
+import org.apache.spark.sql.catalyst.expressions.SortOrder
--- End diff --
ok
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172448589
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49575/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171133381
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49283/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173433597
**[Test build #49844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49844/consoleFull)** for PR 10731 at commit [`e61429f`](https://github.com/apache/spark/commit/e61429fec35c0f0983ff5e1bfeea11a1cef42690).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172453630
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172482297
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49584/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198082449
Just confirmed what @hvanhovell said, Netezza and Terradata support "group by position".
- https://www.ibm.com/support/knowledgecenter/api/content/nl/en-us/SSULQD_7.2.0/com.ibm.nz.dbu.doc/r_dbuser_select.html
- http://tunweb.teradata.ws/tunstudent/TeradataUserManuals/Teradata_SQL_Quick_Reference.pdf
Also confirmed what @rxin said, in Group By, the position is based on the output columns (select expression).
Thus, I think the integer in `groupingExpressions` should be resolved based on `aggregateExpressions` of `Aggregate`. Please let me know if my understanding is wrong. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171118031
**[Test build #49276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49276/consoleFull)** for PR 10731 at commit [`5a2270b`](https://github.com/apache/spark/commit/5a2270be260c635772fb4d46a7084eecf7166d87).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171195516
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49292/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172482295
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r50072543
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala ---
@@ -17,6 +17,7 @@
package org.apache.spark.sql.execution.joins
+import org.apache.spark.sql.{DataFrame, Row, SQLConf}
--- End diff --
This is for passing the style check
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172453461
**[Test build #49579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49579/consoleFull)** for PR 10731 at commit [`2746e0f`](https://github.com/apache/spark/commit/2746e0f632d7799f0d49d2d4bb8719b73c1716fd).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171118510
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49276/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172482036
**[Test build #49584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49584/consoleFull)** for PR 10731 at commit [`acd00be`](https://github.com/apache/spark/commit/acd00be8d92c0cea72db3e63a1580b515660d61c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172453632
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49579/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172458295
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49583/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-197709327
Sure, will do it. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172453627
**[Test build #49579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49579/consoleFull)** for PR 10731 at commit [`2746e0f`](https://github.com/apache/spark/commit/2746e0f632d7799f0d49d2d4bb8719b73c1716fd).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r50072670
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -445,6 +445,26 @@ class Analyzer(
val newOrdering = resolveSortOrders(ordering, child, throws = false)
Sort(newOrdering, global, child)
+ // Resolve the order index to be a specific column
+ case s @ Sort(ordering, global, child) if child.resolved && s.resolved =>
+ def indexToColumn(index: Int, direction: SortDirection) = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(s,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+ val newOrdering = ordering map {
--- End diff --
Seems like it might be more reasonable from the semantic point of view to override the `resolved` method and move the logic to resolveSortOrders.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-197698882
two other comments:
1. it'd be better to separate order by position and group by position into two prs.
2. we should have config options to allow turning this off.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171559922
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-197979155
@gatorsmile Thank you for the investigation. Yea let's not use negative integer.
Regarding group by clause, do other systems support using integer numbers?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171128219
**[Test build #49283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49283/consoleFull)** for PR 10731 at commit [`d5cb4e2`](https://github.com/apache/spark/commit/d5cb4e2d90b3a64ca3a28163b3b073272eada333).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171513340
**[Test build #49359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49359/consoleFull)** for PR 10731 at commit [`1cb6752`](https://github.com/apache/spark/commit/1cb675262ed3b90afdff5ffd8b1bfa4965e37df4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171559923
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49383/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173492642
**[Test build #49864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49864/consoleFull)** for PR 10731 at commit [`e61429f`](https://github.com/apache/spark/commit/e61429fec35c0f0983ff5e1bfeea11a1cef42690).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198004720
@yhuai "group by position" is not supported by Oracle, DB2 and SQL Server. I am unable to find it in any SQL standard.
Should we continue to support it? Also CC @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56454692
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -497,6 +498,41 @@ class Analyzer(
ordering.map(order => resolveExpression(order, child).asInstanceOf[SortOrder])
Sort(newOrdering, global, child)
+ // Replace the index with the related attribute for ORDER BY
+ // which is a 1-base position of the projection list.
+ case s @ Sort(orders, global, child) if child.resolved &&
+ orders.exists(o => IntegerIndex.unapply(o.child).nonEmpty) =>
+ val newOrders = orders map {
+ case s @ SortOrder(IntegerIndex(index), direction) =>
+ if (index > 0 && index <= child.output.size) {
+ SortOrder(child.output(index - 1), direction)
+ } else {
+ throw new UnresolvedException(s,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${child.output.size}""".stripMargin)
+ }
+ case other => other
+ }
+ Sort(newOrders, global, child)
+
+ // Replace the index with the related attribute for Group BY
+ // which is a 1-base position of the underlying columns.
+ case a @ Aggregate(groups, aggs, child) if child.resolved &&
+ groups.exists(IntegerIndex.unapply(_).nonEmpty) =>
+ val newGroups = groups map {
+ case IntegerIndex(index) =>
+ val attributes = child.output
--- End diff --
we also need to check whether the position is an aggregate function. in postgres
```
rxin=# select 'one', 'two', count(*) from r1 group by 1, 3;
ERROR: aggregate functions are not allowed in GROUP BY
LINE 1: select 'one', 'two', count(*) from r1 group by 1, 3;
^
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-185086718
**[Test build #51419 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51419/consoleFull)** for PR 10731 at commit [`66c54b1`](https://github.com/apache/spark/commit/66c54b14f8e57d181bf78f293fe0de0d899cb2ba).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56537624
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -497,6 +498,41 @@ class Analyzer(
ordering.map(order => resolveExpression(order, child).asInstanceOf[SortOrder])
Sort(newOrdering, global, child)
+ // Replace the index with the related attribute for ORDER BY
+ // which is a 1-base position of the projection list.
+ case s @ Sort(orders, global, child) if child.resolved &&
--- End diff --
I will move it to the rule `ResolveSortReferences`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171249631
**[Test build #49311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49311/consoleFull)** for PR 10731 at commit [`1cb6752`](https://github.com/apache/spark/commit/1cb675262ed3b90afdff5ffd8b1bfa4965e37df4).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r50026179
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -445,6 +445,26 @@ class Analyzer(
val newOrdering = resolveSortOrders(ordering, child, throws = false)
Sort(newOrdering, global, child)
+ // Resolve the order index to be a specific column
+ case s @ Sort(ordering, global, child) if child.resolved && s.resolved =>
+ def indexToColumn(index: Int, direction: SortDirection) = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(s,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+ val newOrdering = ordering map {
--- End diff --
oh, actually, I meant if we can just check if there is any integer literal. If so, we create a new `Sort`. Otherwise, we keep the old one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49540331
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -22,8 +22,9 @@ import java.sql.Timestamp
import org.apache.spark.AccumulatorSuite
import org.apache.spark.sql.catalyst.DefaultParserDialect
-import org.apache.spark.sql.catalyst.analysis.FunctionRegistry
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry}
import org.apache.spark.sql.catalyst.errors.DialectException
+import org.apache.spark.sql.catalyst.expressions.SortOrder
--- End diff --
I'ts for `intercept[UnresolvedException[SortOrder]]`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49967047
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -445,6 +445,26 @@ class Analyzer(
val newOrdering = resolveSortOrders(ordering, child, throws = false)
Sort(newOrdering, global, child)
+ // Resolve the order index to be a specific column
--- End diff --
@yhuai not sure if it's the style you prefer. mind giving suggestions?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171195513
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-185116188
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51419/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171426678
oh, nvm. It is pretty common in other databases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51990476
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
+ agg: Aggregate,
+ child: LogicalPlan,
+ index: Int): Expression = {
+ val attributes = child.output
+ if (index > 0 && index <= attributes.size) {
+ attributes(index - 1)
+ } else {
+ throw new UnresolvedException(agg,
+ s"""Aggregate by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${attributes.size}""".stripMargin)
+ }
+ }
+
+ def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+ case s @ Sort(orders, global, child) if child.resolved =>
--- End diff --
I guess that we only need to trigger this rule if there is any integer literal, right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171133379
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51985716
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -456,21 +455,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
test("literal in agg grouping expressions") {
checkAnswer(
- sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
- Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
+ sql("SELECT SUM(a) FROM testData2 GROUP BY 2"),
+ Seq(Row(6), Row(6)))
+
checkAnswer(
- sql("SELECT a, count(2) FROM testData2 GROUP BY a, 2"),
+ sql("SELECT a, SUM(b) FROM testData2 GROUP BY 1"),
+ Seq(Row(1, 3), Row(2, 3), Row(3, 3)))
+
+ checkAnswer(
+ sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
checkAnswer(
sql("SELECT a, 1, sum(b) FROM testData2 GROUP BY a, 1"),
--- End diff --
`1` would be refer to the first index of columns for group by which is `a` for this case. kind of redundant here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-185116185
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r50063958
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -445,6 +445,26 @@ class Analyzer(
val newOrdering = resolveSortOrders(ordering, child, throws = false)
Sort(newOrdering, global, child)
+ // Resolve the order index to be a specific column
+ case s @ Sort(ordering, global, child) if child.resolved && s.resolved =>
+ def indexToColumn(index: Int, direction: SortDirection) = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(s,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+ val newOrdering = ordering map {
--- End diff --
If there are nested attributes, integer literal could be used to reference nested fields, so we the integer literal should be on top level of order list.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171513682
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49359/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172736637
**[Test build #49663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49663/consoleFull)** for PR 10731 at commit [`0daa766`](https://github.com/apache/spark/commit/0daa766d03538c806175c3389106a83b0fe977e3).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51985837
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
--- End diff --
What's your preference? It's not a must to be a new Rule. How about embedded it into `ResolvedReferences`? I was thinking of putting this similar logic into one place.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172736625
@hvanhovell didn't aware of #10052, would be happy if @dereksabryfb can pick up that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172448586
**[Test build #49575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49575/consoleFull)** for PR 10731 at commit [`d9f548c`](https://github.com/apache/spark/commit/d9f548c4e0c62a08de429ce9bf96b3b7aa11b792).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-178271829
@yhuai any more comments?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171195234
**[Test build #49292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49292/consoleFull)** for PR 10731 at commit [`b72547b`](https://github.com/apache/spark/commit/b72547b04923862579935344f6cd9af1814b77f5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172424905
yes, It's a 1-based indexing for the projection list.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171559762
**[Test build #49383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49383/consoleFull)** for PR 10731 at commit [`fe99e00`](https://github.com/apache/spark/commit/fe99e0043598e6af37c691aeb00d1204a37d9158).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173493055
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49864/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-174397530
**[Test build #49973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49973/consoleFull)** for PR 10731 at commit [`1a36a2a`](https://github.com/apache/spark/commit/1a36a2a83c9fa977aa344eeb08fa544f82d80fdc).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r50652470
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -446,6 +503,10 @@ class Analyzer(
val newOrdering = resolveSortOrders(ordering, child, throws = false)
Sort(newOrdering, global, child)
+ case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
+ val newOrdering = resolveSortOrders(ordering, child, throws = false)
+ Sort(newOrdering, global, child)
--- End diff --
Is this rule the same as the above one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172751017
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49663/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198030823
Thanks. Then, let's add the support to ORDER BY.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56454667
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -497,6 +498,41 @@ class Analyzer(
ordering.map(order => resolveExpression(order, child).asInstanceOf[SortOrder])
Sort(newOrdering, global, child)
+ // Replace the index with the related attribute for ORDER BY
+ // which is a 1-base position of the projection list.
+ case s @ Sort(orders, global, child) if child.resolved &&
+ orders.exists(o => IntegerIndex.unapply(o.child).nonEmpty) =>
+ val newOrders = orders map {
+ case s @ SortOrder(IntegerIndex(index), direction) =>
+ if (index > 0 && index <= child.output.size) {
+ SortOrder(child.output(index - 1), direction)
+ } else {
+ throw new UnresolvedException(s,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${child.output.size}""".stripMargin)
+ }
+ case other => other
+ }
+ Sort(newOrders, global, child)
+
+ // Replace the index with the related attribute for Group BY
+ // which is a 1-base position of the underlying columns.
+ case a @ Aggregate(groups, aggs, child) if child.resolved &&
+ groups.exists(IntegerIndex.unapply(_).nonEmpty) =>
+ val newGroups = groups map {
+ case IntegerIndex(index) =>
+ val attributes = child.output
--- End diff --
I'm not sure if this is correct. 1 and 2 in the group by here refers to the position 1 and 2 in the select list, not the underlying query output.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172458551
**[Test build #49584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49584/consoleFull)** for PR 10731 at commit [`acd00be`](https://github.com/apache/spark/commit/acd00be8d92c0cea72db3e63a1580b515660d61c).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171513680
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51984605
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
@@ -456,21 +455,32 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
test("literal in agg grouping expressions") {
checkAnswer(
- sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
- Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
+ sql("SELECT SUM(a) FROM testData2 GROUP BY 2"),
+ Seq(Row(6), Row(6)))
+
checkAnswer(
- sql("SELECT a, count(2) FROM testData2 GROUP BY a, 2"),
+ sql("SELECT a, SUM(b) FROM testData2 GROUP BY 1"),
+ Seq(Row(1, 3), Row(2, 3), Row(3, 3)))
+
+ checkAnswer(
+ sql("SELECT a, count(1) FROM testData2 GROUP BY a, 1"),
Seq(Row(1, 2), Row(2, 2), Row(3, 2)))
checkAnswer(
sql("SELECT a, 1, sum(b) FROM testData2 GROUP BY a, 1"),
--- End diff --
what is the meaning of 1 in this case?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li closed the pull request at:
https://github.com/apache/spark/pull/10731
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-174414550
**[Test build #49973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49973/consoleFull)** for PR 10731 at commit [`1a36a2a`](https://github.com/apache/spark/commit/1a36a2a83c9fa977aa344eeb08fa544f82d80fdc).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51990408
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
--- End diff --
I think `ResolvedReferences` is good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by tejasapatil <gi...@git.apache.org>.
Github user tejasapatil commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-177531893
I would like to have this patch in trunk. If @zhichao-li is not planning to do more changes, can one of the admins review ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49539533
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -441,7 +442,7 @@ class Analyzer(
// When resolve `SortOrder`s in Sort based on child, don't report errors as
// we still have chance to resolve it based on grandchild
- case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
--- End diff --
why remove it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171249380
**[Test build #49311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49311/consoleFull)** for PR 10731 at commit [`1cb6752`](https://github.com/apache/spark/commit/1cb675262ed3b90afdff5ffd8b1bfa4965e37df4).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172751016
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56610085
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ---
@@ -202,3 +203,14 @@ object Unions {
}
}
}
+
+/**
+ * Extractor for retrieving Int value.
+ */
+object IntegerIndex {
+ def unapply(a: Any): Option[Int] = a match {
+ case Literal(a: Int, IntegerType) => Some(a)
+ case UnaryMinus(IntegerLiteral(v)) => Some(-v)
--- End diff --
This line is used for catching the illegal case:
```scala
sql("SELECT * FROM testData2 ORDER BY -1 DESC, b ASC").collect()
```
I plan to keep it untouched in the PR. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198031998
It is pretty obvious isn't it even for group by? It is just the project list, not the underlying table.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171133352
**[Test build #49283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49283/consoleFull)** for PR 10731 at commit [`d5cb4e2`](https://github.com/apache/spark/commit/d5cb4e2d90b3a64ca3a28163b3b073272eada333).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173474664
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51984436
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
+
+ def indexToColumn(
+ sortOrder: SortOrder,
+ child: LogicalPlan,
+ index: Int,
+ direction: SortDirection): SortOrder = {
+ val orderNodes = child.output
+ if (index > 0 && index <= orderNodes.size) {
+ SortOrder(orderNodes(index - 1), direction)
+ } else {
+ throw new UnresolvedException(sortOrder,
+ s"""Order by position: $index does not exist \n
+ |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin)
+ }
+ }
+
+ def indexToColumn(
--- End diff --
Can we combine this method the one above?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173444380
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49844/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r50653035
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -446,6 +503,10 @@ class Analyzer(
val newOrdering = resolveSortOrders(ordering, child, throws = false)
Sort(newOrdering, global, child)
+ case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
+ val newOrdering = resolveSortOrders(ordering, child, throws = false)
+ Sort(newOrdering, global, child)
--- End diff --
oh. my bad , had been aware of this, but forgot to push out the code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r51984399
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -322,6 +323,62 @@ class Analyzer(
}
/**
+ * Replace the index with the related attribute for order by and group by.
+ */
+ object ResolveIndexReferences extends Rule[LogicalPlan] {
--- End diff --
Do we need to have a new rule?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r56752434
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -497,6 +498,41 @@ class Analyzer(
ordering.map(order => resolveExpression(order, child).asInstanceOf[SortOrder])
Sort(newOrdering, global, child)
+ // Replace the index with the related attribute for ORDER BY
+ // which is a 1-base position of the projection list.
+ case s @ Sort(orders, global, child) if child.resolved &&
--- End diff --
I am unable to find a good place for `group by ordinal resolution`, after placing `order by ordinal resolution` in `ResolveSortReferences`. Two options are in my mind:
- Assuming we can merge https://github.com/apache/spark/pull/11208, which splits `ResolveReferences` to two rules: `ResolveStar` and `ResolveReferences`. Then, `ResolveReferences` is not very long, maybe we can keep resolution of ordinal here.
- Create a separate rule `ResolveOrdinal` for both cases.
In the next PR, I will first pick the second option, if nobody is against it. : ) Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-172448431
**[Test build #49575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49575/consoleFull)** for PR 10731 at commit [`d9f548c`](https://github.com/apache/spark/commit/d9f548c4e0c62a08de429ce9bf96b3b7aa11b792).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-180248455
@zhichao-li can you also put the explanation of the semantic in the description?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198032884
GROUP BY position is supported by a few major analytical databases: Terradata & Netezza
I am not sure if you should even allow the combination of a `SELECT *` with a positional ORDER BY/GROUP BY clause
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49961674
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -438,11 +438,25 @@ class Analyzer(
}
j.copy(right = newRight)
}
-
// When resolve `SortOrder`s in Sort based on child, don't report errors as
// we still have chance to resolve it based on grandchild
- case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
- val newOrdering = resolveSortOrders(ordering, child, throws = false)
+ case s @ Sort(ordering, global, child) if child.resolved =>
--- End diff --
Should we keep the `&& !s.resolved` in the condition and have another case to handle the case of all literals?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by zhichao-li <gi...@git.apache.org>.
Github user zhichao-li commented on a diff in the pull request:
https://github.com/apache/spark/pull/10731#discussion_r49689992
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
@@ -441,7 +442,7 @@ class Analyzer(
// When resolve `SortOrder`s in Sort based on child, don't report errors as
// we still have chance to resolve it based on grandchild
- case s @ Sort(ordering, global, child) if child.resolved && !s.resolved =>
--- End diff --
ah.. right, let me add it back.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-171382217
@zhichao-li It will be good if you can take a look and see if other databases (other than hive) support this. I am not sure if it is really useful.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198018058
What you said about order by is right. The most tricky part is `*`. When we are doing select (*) in DB2, the position number is based on the table definition in catalog table.
Regarding Group By, I do not know which behavior is right. Different from Order By, Group By is below Project/SELECT. Thus, personally, I do not know what is the expected behavior when resolving position number in group by. Just like, the alias defined in Project/Select cannot be used in Group By.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-198804070
"Group By Ordinal" will throw an exception if the corresponding position of the select list is an `AggregateFunction`. This is not allowed. I believe this PR misses this point. Please correct me if my understanding is wrong. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-173475388
**[Test build #49864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49864/consoleFull)** for PR 10731 at commit [`e61429f`](https://github.com/apache/spark/commit/e61429fec35c0f0983ff5e1bfeea11a1cef42690).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/10731#issuecomment-197727832
Just did a quick search.
> SQL92 allowed the use of ordinal positions for sort_expressions, but this functionality has been deprecated and should not be used in SQL2003 queries.
However, the mainstream RDBMS still support it.
- Oracle: https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10002.htm#i2171079
- DB2: https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_sql_orderbyclause.dita
- SQL Server: https://msdn.microsoft.com/en-us/library/ms188385.aspx
None of these top 3 enterprise RDBMS are allowing negative positions. I think we should not support the negative integer in Order by.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org