You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xwu0226 <gi...@git.apache.org> on 2015/10/28 09:06:00 UTC
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
GitHub user xwu0226 opened a pull request:
https://github.com/apache/spark/pull/9326
[SPARK-11246] [SQL] Table cache for Parquet broken in 1.5
The root cause is that when spark.sql.hive.convertMetastoreParquet=true by default, the cached InMemoryRelation of the ParquetRelation can not be looked up from the cachedData of CacheManager because the key comparison fails even though it is the same LogicalPlan representing the Subquery that wraps the ParquetRelation.
The solution in this PR is overriding the LogicalPlan.sameResult function in Subquery case class to eliminate subquery node first before directly comparing the child (ParquetRelation), which will find the key to the cached InMemoryRelation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xwu0226/spark spark-11246-commit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9326.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9326
----
commit 402d8e495d0fec01c3b7bb7fc8dcdf4efa56d1d2
Author: xin Wu <xi...@us.ibm.com>
Date: 2015-10-28T06:26:19Z
[SPARK-11246] Table cache for Parquet broken in 1.5
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151873941
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152075529
@yhuai Just pushed again and also removed some unnecessary imports that I used for debugging today and forgot to remove.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152096108
**[Test build #44562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44562/consoleFull)** for PR 9326 at commit [`5ed1429`](https://github.com/apache/spark/commit/5ed1429dc598bb929d98a7447db05d73c6356ffe).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152080910
**[Test build #44564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44564/consoleFull)** for PR 9326 at commit [`3f25d6a`](https://github.com/apache/spark/commit/3f25d6a1b330ea2c1d0db4c9e9aeb09ce3c61311).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152079142
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43349387
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -62,6 +62,8 @@ case class LogicalRelation(
case _ => false
}
+ override lazy val cleanArgs: Seq[Any] = Seq(relation)
--- End diff --
Yes. let me add it. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43349153
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -62,6 +62,8 @@ case class LogicalRelation(
case _ => false
}
+ override lazy val cleanArgs: Seq[Any] = Seq(relation)
--- End diff --
Can you also add a comment to explain why we need to override it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152105147
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44568/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151758255
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151934042
Actually, `sameResult` will remove `SubQuery` nodes. I think the problem is that `cleanRight.cleanArgs == cleanLeft.cleanArgs` returns false when two `LogicalRelation`s are considered generating the same result (because of `expectedOutputAttributes` that we added in #8889 ). I think the right fix is to correctly override the `cleanArgs` field of `LogicalRelation` to exclude `expectedOutputAttributes`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43349135
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
@@ -17,6 +17,7 @@
package org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.sql.catalyst.analysis.EliminateSubQueries
--- End diff --
also remove this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152079599
@yhuai Just pushed. Please help take a look, thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151839373
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151841140
**[Test build #44527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44527/consoleFull)** for PR 9326 at commit [`402d8e4`](https://github.com/apache/spark/commit/402d8e495d0fec01c3b7bb7fc8dcdf4efa56d1d2).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152082877
**[Test build #44568 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44568/consoleFull)** for PR 9326 at commit [`88edd14`](https://github.com/apache/spark/commit/88edd14e6cb6f25d15e83bcb8e24c3c3e5306ed8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152075516
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152104914
**[Test build #44568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44568/consoleFull)** for PR 9326 at commit [`88edd14`](https://github.com/apache/spark/commit/88edd14e6cb6f25d15e83bcb8e24c3c3e5306ed8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152098190
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44563/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43348679
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -61,6 +64,7 @@ case class LogicalRelation(
case LogicalRelation(otherRelation, _) => relation == otherRelation
case _ => false
}
+ override lazy val cleanArgs: Seq[Any] = Seq(relation)
--- End diff --
Thanks! Will do.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152082460
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152096229
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151873727
**[Test build #44527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44527/consoleFull)** for PR 9326 at commit [`402d8e4`](https://github.com/apache/spark/commit/402d8e495d0fec01c3b7bb7fc8dcdf4efa56d1d2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43349383
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
@@ -17,6 +17,7 @@
package org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.sql.catalyst.analysis.EliminateSubQueries
--- End diff --
Yes. I will. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9326
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152098188
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151837034
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43348563
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -61,6 +64,7 @@ case class LogicalRelation(
case LogicalRelation(otherRelation, _) => relation == otherRelation
case _ => false
}
+ override lazy val cleanArgs: Seq[Any] = Seq(relation)
--- End diff --
Add a new line above this line.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152201027
LGTM. Merging to master and branch 1.5.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152080921
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44564/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152080917
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152073153
@yhuai Thanks so much for your comments and suggestions!
I have thought through your suggestions and override `cleanArgs` field in `LogicalRelation`, that returns the `BaseRelation` (in this case, `ParquetRelation`) without the `expectedOutputAttributes`.. I removed the original fix from `Subquery` and commited the change.
Let me know what you think. Thank you again!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152074664
Just a minor comment about the format.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152072919
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/9326#discussion_r43348628
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala ---
@@ -17,10 +17,13 @@
package org.apache.spark.sql.execution.datasources
import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeReference}
+import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan, Statistics}
+import org.apache.spark.sql.catalyst.trees.TreeNode
import org.apache.spark.sql.sources.BaseRelation
+import scala.collection.mutable.ArrayBuffer
--- End diff --
Import ordering (see https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152098100
**[Test build #44563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44563/consoleFull)** for PR 9326 at commit [`fb70954`](https://github.com/apache/spark/commit/fb709542a42791fa1c617bb2c5e034459123ff2d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152079112
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152073802
**[Test build #44562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44562/consoleFull)** for PR 9326 at commit [`5ed1429`](https://github.com/apache/spark/commit/5ed1429dc598bb929d98a7447db05d73c6356ffe).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152075503
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by xwu0226 <gi...@git.apache.org>.
Github user xwu0226 commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152218255
@yhuai Thank you very much for merging it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151839311
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152082468
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152080251
**[Test build #44564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44564/consoleFull)** for PR 9326 at commit [`3f25d6a`](https://github.com/apache/spark/commit/3f25d6a1b330ea2c1d0db4c9e9aeb09ce3c61311).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152072928
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152075711
**[Test build #44563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44563/consoleFull)** for PR 9326 at commit [`fb70954`](https://github.com/apache/spark/commit/fb709542a42791fa1c617bb2c5e034459123ff2d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152096230
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44562/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-152105146
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-11246] [SQL] Table cache for Parquet br...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9326#issuecomment-151873944
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44527/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org