You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/11/04 10:45:52 UTC
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/22941
[SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does not use Cached Data
## What changes were proposed in this pull request?
```java
spark.sql("""
CREATE TABLE jdbcTable
USING org.apache.spark.sql.jdbc
OPTIONS (
url "jdbc:mysql://localhost:3306/test",
dbtable "test.InsertIntoDataSourceCommand",
user "hive",
password "hive"
)""")
spark.range(2).createTempView("test_view")
spark.catalog.cacheTable("test_view")
spark.sql("INSERT INTO TABLE jdbcTable SELECT * FROM test_view").explain
```
Before this PR:
```
== Physical Plan ==
Execute InsertIntoDataSourceCommand
+- InsertIntoDataSourceCommand
+- Project
+- SubqueryAlias
+- Range (0, 2, step=1, splits=Some(8))
```
After this PR:
```
== Physical Plan ==
Execute InsertIntoDataSourceCommand InsertIntoDataSourceCommand Relation[id#8L] JDBCRelation(test.InsertIntoDataSourceCommand) [numPartitions=1], false, [id]
+- *(1) InMemoryTableScan [id#0L]
+- InMemoryRelation [id#0L], StorageLevel(disk, memory, deserialized, 1 replicas)
+- *(1) Range (0, 2, step=1, splits=8)
```
## How was this patch tested?
unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-25936
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22941.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22941
----
commit 2968b2c34f42f6b0bcb5e373a400377abfd09e86
Author: Yuming Wang <yu...@...>
Date: 2018-11-04T10:36:20Z
Fix InsertIntoDataSourceCommand does not use Cached Data
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22941
I think this is not a bug, although the plan is confusing.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22941#discussion_r230622708
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
sql("INSERT INTO TABLE test_table SELECT 2, null")
}
}
+
+ test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") {
--- End diff --
It works. Do we need to fix this plan issue?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22941
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98447/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum closed the pull request at:
https://github.com/apache/spark/pull/22941
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22941
**[Test build #98447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98447/testReport)** for PR 22941 at commit [`2968b2c`](https://github.com/apache/spark/commit/2968b2c34f42f6b0bcb5e373a400377abfd09e86).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22941#discussion_r230608937
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoDataSourceCommand.scala ---
@@ -30,14 +30,13 @@ import org.apache.spark.sql.sources.InsertableRelation
case class InsertIntoDataSourceCommand(
logicalRelation: LogicalRelation,
query: LogicalPlan,
- overwrite: Boolean)
- extends RunnableCommand {
+ overwrite: Boolean,
+ outputColumnNames: Seq[String])
+ extends DataWritingCommand {
- override protected def innerChildren: Seq[QueryPlan[_]] = Seq(query)
-
- override def run(sparkSession: SparkSession): Seq[Row] = {
+ override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = {
val relation = logicalRelation.relation.asInstanceOf[InsertableRelation]
- val data = Dataset.ofRows(sparkSession, query)
--- End diff --
This will use the cached data, although the plan does not show the cached data is used.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22941
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22941
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4753/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22941#discussion_r230609046
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
@@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
sql("INSERT INTO TABLE test_table SELECT 2, null")
}
}
+
+ test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") {
--- End diff --
You can move this test suite to CachedTableSuite.scala and use the helper functions to verify whether the cache is used.
See the example.
```
spark.range(2).createTempView("test_view")
spark.catalog.cacheTable("test_view")
val rddId = rddIdOf("test_view")
assert(!isMaterialized(rddId))
sql("INSERT INTO TABLE test_table SELECT * FROM test_view")
assert(isMaterialized(rddId))
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22941
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22941
**[Test build #98447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98447/testReport)** for PR 22941 at commit [`2968b2c`](https://github.com/apache/spark/commit/2968b2c34f42f6b0bcb5e373a400377abfd09e86).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org