You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/11/04 10:45:52 UTC

[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22941

    [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does not use Cached Data

    ## What changes were proposed in this pull request?
    
    ```java
    spark.sql("""
      CREATE TABLE jdbcTable
      USING org.apache.spark.sql.jdbc
      OPTIONS (
        url "jdbc:mysql://localhost:3306/test",
        dbtable "test.InsertIntoDataSourceCommand",
        user "hive",
        password "hive"
      )""")
    
    spark.range(2).createTempView("test_view")
    spark.catalog.cacheTable("test_view")
    spark.sql("INSERT INTO TABLE jdbcTable SELECT * FROM test_view").explain
    ```
    
    Before this PR:
    ```
    == Physical Plan ==                                                             
    Execute InsertIntoDataSourceCommand
       +- InsertIntoDataSourceCommand
             +- Project
                +- SubqueryAlias
                   +- Range (0, 2, step=1, splits=Some(8))
    ```
    
    After this PR:
    ```
    == Physical Plan ==                                                             
    Execute InsertIntoDataSourceCommand InsertIntoDataSourceCommand Relation[id#8L] JDBCRelation(test.InsertIntoDataSourceCommand) [numPartitions=1], false, [id]
    +- *(1) InMemoryTableScan [id#0L]
          +- InMemoryRelation [id#0L], StorageLevel(disk, memory, deserialized, 1 replicas)
                +- *(1) Range (0, 2, step=1, splits=8)
    ```
    
    ## How was this patch tested?
    
    unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25936

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22941
    
----
commit 2968b2c34f42f6b0bcb5e373a400377abfd09e86
Author: Yuming Wang <yu...@...>
Date:   2018-11-04T10:36:20Z

    Fix InsertIntoDataSourceCommand does not use Cached Data

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    I think this is not a bug, although the plan is confusing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22941#discussion_r230622708
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
    @@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
           sql("INSERT INTO TABLE test_table SELECT 2, null")
         }
       }
    +
    +  test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") {
    --- End diff --
    
    It works. Do we need to fix this plan issue?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98447/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum closed the pull request at:

    https://github.com/apache/spark/pull/22941


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    **[Test build #98447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98447/testReport)** for PR 22941 at commit [`2968b2c`](https://github.com/apache/spark/commit/2968b2c34f42f6b0bcb5e373a400377abfd09e86).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22941#discussion_r230608937
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoDataSourceCommand.scala ---
    @@ -30,14 +30,13 @@ import org.apache.spark.sql.sources.InsertableRelation
     case class InsertIntoDataSourceCommand(
         logicalRelation: LogicalRelation,
         query: LogicalPlan,
    -    overwrite: Boolean)
    -  extends RunnableCommand {
    +    overwrite: Boolean,
    +    outputColumnNames: Seq[String])
    +  extends DataWritingCommand {
     
    -  override protected def innerChildren: Seq[QueryPlan[_]] = Seq(query)
    -
    -  override def run(sparkSession: SparkSession): Seq[Row] = {
    +  override def run(sparkSession: SparkSession, child: SparkPlan): Seq[Row] = {
         val relation = logicalRelation.relation.asInstanceOf[InsertableRelation]
    -    val data = Dataset.ofRows(sparkSession, query)
    --- End diff --
    
    This will use the cached data, although the plan does not show the cached data is used. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4753/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceComman...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22941#discussion_r230609046
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala ---
    @@ -589,4 +590,33 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
           sql("INSERT INTO TABLE test_table SELECT 2, null")
         }
       }
    +
    +  test("SPARK-25936 InsertIntoDataSourceCommand does not use Cached Data") {
    --- End diff --
    
    You can move this test suite to CachedTableSuite.scala and use the helper functions to verify whether the cache is used. 
    
    See the example. 
    ```
            spark.range(2).createTempView("test_view")
            spark.catalog.cacheTable("test_view")
            val rddId = rddIdOf("test_view")
            assert(!isMaterialized(rddId))
            sql("INSERT INTO TABLE test_table SELECT * FROM test_view")
            assert(isMaterialized(rddId))
    ```
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22941: [SPARK-25936][SQL] Fix InsertIntoDataSourceCommand does ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22941
  
    **[Test build #98447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98447/testReport)** for PR 22941 at commit [`2968b2c`](https://github.com/apache/spark/commit/2968b2c34f42f6b0bcb5e373a400377abfd09e86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org