You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2015/09/24 01:13:37 UTC
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/8889
[SPARK-10741][SQL] Hive Query Having/OrderBy against Parquet table is not working
https://issues.apache.org/jira/browse/SPARK-10741
I choose the second approach: do not change output exprIds when convert MetastoreRelation to LogicalRelation
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark hot-bug
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8889.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8889
----
commit 798bc73c78142f725820202adcfcfe85fe14005b
Author: Wenchen Fan <cl...@163.com>
Date: 2015-09-23T23:11:45Z
do not change output exprIds when convert MetastoreRelation to LogicalRelation
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143358045
LGTM except for a few minor issues.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143379490
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142774336
cc @liancheng
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40340100
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,44 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. The `expectedOutputAttributes` parameter is used for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] case class LogicalRelation(
+ relation: BaseRelation,
+ expectedOutputAttributes: Seq[Attribute] = Nil)
+ extends LeafNode with MultiInstanceRelation {
+
+ assert(expectedOutputAttributes == Nil ||
+ relation.schema.length == expectedOutputAttributes.length)
- override val output: Seq[AttributeReference] = relation.schema.toAttributes
+ override val output: Seq[AttributeReference] = {
+ val attrs = relation.schema.toAttributes
+ if (expectedOutputAttributes == Nil) {
+ attrs
+ } else {
+ attrs.zip(expectedOutputAttributes).map {
--- End diff --
Ah, i see. You check the length at line 36.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142997788
[Test build #42975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42975/consoleFull) for PR 8889 at commit [`56efdc2`](https://github.com/apache/spark/commit/56efdc27369f22d041ceb6756eda5e38620d2f98).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143383598
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43036/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142815578
[Test build #42962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42962/consoleFull) for PR 8889 at commit [`d317fb8`](https://github.com/apache/spark/commit/d317fb8b0d21ba387894962a168afdf9346b74d2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143515149
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142775585
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40339049
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,44 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. The `expectedOutputAttributes` parameter is used for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] case class LogicalRelation(
+ relation: BaseRelation,
+ expectedOutputAttributes: Seq[Attribute] = Nil)
--- End diff --
Can we use `Option[Seq[Attribute]]`? I think the semantic of having a `Option` at here is more clearer than just using a `Seq` and the meaning of it is self-explanatory. `None` means that `expectedOutputAttributes` is not defined.
If we use `Seq`, `Nil` does not necessarily mean that `expectedOutputAttributes` is not defined. Without other comments, it can mean that we have defined `expectedOutputAttributes` and it is a `Nil`. Readers have to take a look at our comments to understand its meaning.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40342337
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,44 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. The `expectedOutputAttributes` parameter is used for
+ * this purpose.
--- End diff --
Probably it is better to have a link to the jira at here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142774597
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40273324
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,35 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{ExprId, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. We can pass in the `expectedOutputIds` parameter for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] final class LogicalRelation(val relation: BaseRelation, expectedOutputIds: Seq[ExprId])
--- End diff --
You can also manually implement product as is done here, but if possible I'd like to avoid that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143511251
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142996116
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143063695
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40273156
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,35 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{ExprId, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. We can pass in the `expectedOutputIds` parameter for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] final class LogicalRelation(val relation: BaseRelation, expectedOutputIds: Seq[ExprId])
--- End diff --
I think we need to make it a case class because when a TreeNode makes copy, it copies parameters in its `productIterator`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142837430
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42962/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142758645
[Test build #42934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42934/consoleFull) for PR 8889 at commit [`798bc73`](https://github.com/apache/spark/commit/798bc73c78142f725820202adcfcfe85fe14005b).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143064844
[Test build #42986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42986/consoleFull) for PR 8889 at commit [`30e930e`](https://github.com/apache/spark/commit/30e930e01a2329dcecf1c899de06d3cdc52a7744).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142774719
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42934/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40273351
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,35 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{ExprId, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. We can pass in the `expectedOutputIds` parameter for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] final class LogicalRelation(val relation: BaseRelation, expectedOutputIds: Seq[ExprId])
--- End diff --
How about we use the following?
```
private[sql] case class LogicalRelation(
relation: BaseRelation,
override val output: Seq[AttributeReference])
extends LeafNode
with MultiInstanceRelation {
def this(relation: BaseRelation) = this(relation, relation.schema.toAttributes)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143022548
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40476942
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -553,60 +553,28 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive
return plan
}
- // Collects all `MetastoreRelation`s which should be replaced
- val toBeReplaced = plan.collect {
+ plan transformUp {
// Write path
- case InsertIntoTable(relation: MetastoreRelation, _, _, _, _)
- // Inserting into partitioned table is not supported in Parquet data source (yet).
- if !relation.hiveQlTable.isPartitioned &&
- hive.convertMetastoreParquet &&
- relation.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
- val parquetRelation = convertToParquetRelation(relation)
- val attributedRewrites = relation.output.zip(parquetRelation.output)
- (relation, parquetRelation, attributedRewrites)
+ case i @ InsertIntoTable(r: MetastoreRelation, partition, child, overwrite, ifNotExists)
+ // Inserting into partitioned table is not supported in Parquet data source (yet).
+ if !r.hiveQlTable.isPartitioned && hive.convertMetastoreParquet &&
+ r.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
+ val parquetRelation = convertToParquetRelation(r)
+ InsertIntoTable(parquetRelation, partition, child, overwrite, ifNotExists)
// Write path
- case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _, _)
+ case i @ InsertIntoHiveTable(r: MetastoreRelation, partition, child, overwrite, ifNotExists)
// Inserting into partitioned table is not supported in Parquet data source (yet).
- if !relation.hiveQlTable.isPartitioned &&
- hive.convertMetastoreParquet &&
- relation.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
- val parquetRelation = convertToParquetRelation(relation)
- val attributedRewrites = relation.output.zip(parquetRelation.output)
- (relation, parquetRelation, attributedRewrites)
+ if !r.hiveQlTable.isPartitioned && hive.convertMetastoreParquet &&
+ r.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
+ val parquetRelation = convertToParquetRelation(r)
+ InsertIntoTable(parquetRelation, partition, child, overwrite, ifNotExists)
--- End diff --
Or we may preserve `i` and replace this line with `i.copy(table = parquetRelation)`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142837379
[Test build #42962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42962/console) for PR 8889 at commit [`d317fb8`](https://github.com/apache/spark/commit/d317fb8b0d21ba387894962a168afdf9346b74d2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40338157
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,44 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. The `expectedOutputAttributes` parameter is used for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] case class LogicalRelation(
+ relation: BaseRelation,
+ expectedOutputAttributes: Seq[Attribute] = Nil)
+ extends LeafNode with MultiInstanceRelation {
+
+ assert(expectedOutputAttributes == Nil ||
+ relation.schema.length == expectedOutputAttributes.length)
- override val output: Seq[AttributeReference] = relation.schema.toAttributes
+ override val output: Seq[AttributeReference] = {
+ val attrs = relation.schema.toAttributes
+ if (expectedOutputAttributes == Nil) {
+ attrs
+ } else {
+ attrs.zip(expectedOutputAttributes).map {
--- End diff --
Let's add a check to make sure that `attrs` and `expectedOutputAttributes` have the same length. Otherwise, `zip` will silently drop elements.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143112924
@liancheng can you also take a look?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143383583
[Test build #43036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43036/console) for PR 8889 at commit [`8fb386a`](https://github.com/apache/spark/commit/8fb386ab8846fc57045a10aa2bf1245996f506c8).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142775586
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42945/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143381186
[Test build #43036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43036/consoleFull) for PR 8889 at commit [`8fb386a`](https://github.com/apache/spark/commit/8fb386ab8846fc57045a10aa2bf1245996f506c8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142757727
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143086881
[Test build #42986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42986/console) for PR 8889 at commit [`30e930e`](https://github.com/apache/spark/commit/30e930e01a2329dcecf1c899de06d3cdc52a7744).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/8889
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143063721
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143383597
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143086950
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142758762
cc @yhuai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142996070
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143022477
[Test build #42975 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42975/console) for PR 8889 at commit [`56efdc2`](https://github.com/apache/spark/commit/56efdc27369f22d041ceb6756eda5e38620d2f98).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40476708
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -553,60 +553,28 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive
return plan
}
- // Collects all `MetastoreRelation`s which should be replaced
- val toBeReplaced = plan.collect {
+ plan transformUp {
// Write path
- case InsertIntoTable(relation: MetastoreRelation, _, _, _, _)
- // Inserting into partitioned table is not supported in Parquet data source (yet).
- if !relation.hiveQlTable.isPartitioned &&
- hive.convertMetastoreParquet &&
- relation.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
- val parquetRelation = convertToParquetRelation(relation)
- val attributedRewrites = relation.output.zip(parquetRelation.output)
- (relation, parquetRelation, attributedRewrites)
+ case i @ InsertIntoTable(r: MetastoreRelation, partition, child, overwrite, ifNotExists)
--- End diff --
Same as below.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143380515
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143086951
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42986/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142837429
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142814865
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142775581
[Test build #42945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42945/console) for PR 8889 at commit [`2e21513`](https://github.com/apache/spark/commit/2e21513c123c6b56b68204ea1cb60a3be62f77cc).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40477126
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
@@ -1223,4 +1223,26 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
checkAnswer(df, (0 until 5).map(i => Row(i + "#", i + "#")))
}
+
+ test("SPARK-10741: Sort on Aggregate using parquet") {
+ withTable("test10741") {
+ sql("CREATE TABLE test10741(c1 STRING, c2 INT) STORED AS PARQUET")
+
+ checkAnswer(sql(
+ """
+ |SELECT c1, AVG(c2) AS c_avg
+ |FROM test10741
+ |GROUP BY c1
+ |HAVING (AVG(c2) > 5) ORDER BY c1
+ """.stripMargin), Nil)
+
+ checkAnswer(sql(
+ """
+ |SELECT c1, AVG(c2) AS c_avg
+ |FROM test10741
+ |GROUP BY c1
+ |ORDER BY AVG(c2)
+ """.stripMargin), Nil)
--- End diff --
Would be better to insert some testing data into the table so that we can assert that the `ORDER BY` actually works.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142774609
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142775374
[Test build #42945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42945/consoleFull) for PR 8889 at commit [`2e21513`](https://github.com/apache/spark/commit/2e21513c123c6b56b68204ea1cb60a3be62f77cc).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40340163
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,44 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. The `expectedOutputAttributes` parameter is used for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] case class LogicalRelation(
+ relation: BaseRelation,
+ expectedOutputAttributes: Seq[Attribute] = Nil)
+ extends LeafNode with MultiInstanceRelation {
+
+ assert(expectedOutputAttributes == Nil ||
+ relation.schema.length == expectedOutputAttributes.length)
- override val output: Seq[AttributeReference] = relation.schema.toAttributes
+ override val output: Seq[AttributeReference] = {
+ val attrs = relation.schema.toAttributes
+ if (expectedOutputAttributes == Nil) {
+ attrs
+ } else {
+ attrs.zip(expectedOutputAttributes).map {
--- End diff --
Then, let's add a comment to say where we check the length.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143572620
LGTM. Merging to master and branch 1.5.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143515129
[Test build #43053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43053/console) for PR 8889 at commit [`8fb386a`](https://github.com/apache/spark/commit/8fb386ab8846fc57045a10aa2bf1245996f506c8).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40342268
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
@@ -1223,4 +1223,26 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
checkAnswer(df, (0 until 5).map(i => Row(i + "#", i + "#")))
}
+
+ test("SPARK-10741: Sort on Aggregate using parquet") {
+ withTable("test") {
--- End diff --
Let's use a unique name like `test10741`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143022552
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42975/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40274896
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,23 +17,35 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions.{ExprId, AttributeMap, AttributeReference}
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
import org.apache.spark.sql.sources.BaseRelation
/**
* Used to link a [[BaseRelation]] in to a logical query plan.
+ *
+ * Note that sometimes we need to use `LogicalRelation` to replace an existing leaf node without
+ * changing the output attributes' IDs. We can pass in the `expectedOutputIds` parameter for
+ * this purpose.
*/
-private[sql] case class LogicalRelation(relation: BaseRelation)
- extends LeafNode
- with MultiInstanceRelation {
+private[sql] final class LogicalRelation(val relation: BaseRelation, expectedOutputIds: Seq[ExprId])
--- End diff --
Had a discussion with @marmbrus and @cloud-fan, we will keep it as a case class and change other places where we need to use `unapply`. We will also pass in a `Seq[AttributeReference]` and do the work of adding the correct `exprId`s inside `LogicalRelation`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/8889#discussion_r40476685
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -553,60 +553,28 @@ private[hive] class HiveMetastoreCatalog(val client: ClientInterface, hive: Hive
return plan
}
- // Collects all `MetastoreRelation`s which should be replaced
- val toBeReplaced = plan.collect {
+ plan transformUp {
// Write path
- case InsertIntoTable(relation: MetastoreRelation, _, _, _, _)
- // Inserting into partitioned table is not supported in Parquet data source (yet).
- if !relation.hiveQlTable.isPartitioned &&
- hive.convertMetastoreParquet &&
- relation.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
- val parquetRelation = convertToParquetRelation(relation)
- val attributedRewrites = relation.output.zip(parquetRelation.output)
- (relation, parquetRelation, attributedRewrites)
+ case i @ InsertIntoTable(r: MetastoreRelation, partition, child, overwrite, ifNotExists)
+ // Inserting into partitioned table is not supported in Parquet data source (yet).
+ if !r.hiveQlTable.isPartitioned && hive.convertMetastoreParquet &&
+ r.tableDesc.getSerdeClassName.toLowerCase.contains("parquet") =>
+ val parquetRelation = convertToParquetRelation(r)
+ InsertIntoTable(parquetRelation, partition, child, overwrite, ifNotExists)
// Write path
- case InsertIntoHiveTable(relation: MetastoreRelation, _, _, _, _)
+ case i @ InsertIntoHiveTable(r: MetastoreRelation, partition, child, overwrite, ifNotExists)
--- End diff --
Nit: seems that we don't use `i` here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143511588
[Test build #43053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43053/consoleFull) for PR 8889 at commit [`8fb386a`](https://github.com/apache/spark/commit/8fb386ab8846fc57045a10aa2bf1245996f506c8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142757717
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143511190
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143515150
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43053/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143511258
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142774667
[Test build #42934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42934/console) for PR 8889 at commit [`798bc73`](https://github.com/apache/spark/commit/798bc73c78142f725820202adcfcfe85fe14005b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-143027674
Seems it failed some tests.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142774718
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-10741][SQL] Hive Query Having/OrderBy a...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8889#issuecomment-142814881
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org