You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/02/07 08:54:25 UTC

[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20527

    [SPARK-23348][SQL] append data using saveAsTable should adjust the data types

    ## What changes were proposed in this pull request?
    
    For inserting/appending data to an existing table, Spark should adjust the data types of the input query according to the table schema, or fail fast if it's uncastable.
    
    There are several ways to insert/append data: SQL API, `DataFrameWriter.insertInto`, `DataFrameWriter.saveAsTable`. The first 2 ways create `InsertIntoTable` plan, and the last way creates `CreateTable` plan. However, we only adjust input query data types for `InsertIntoTable`, and users may hit weird errors when appending data using `saveAsTable`. See the JIRA for the error case.
    
    This PR fixes this bug by adjusting data types for `CreateTable` too.
    
    ## How was this patch tested?
    
    new test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark saveAsTable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20527
    
----
commit ad19125ab54439979eb4aa2bebf2cb8c9c85551e
Author: Wenchen Fan <we...@...>
Date:   2018-02-07T08:34:40Z

    append data using saveAsTable should adjust the data types

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87151/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    LGTM only some nits.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20527#discussion_r166770787
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo
           checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
         }
       }
    +
    +  // TODO: This test is copied from HiveDDLSuite, unify it later.
    +  test("SPARK-23348: append data to data source table with saveAsTable") {
    +    withTable("t", "t1") {
    +      Seq(1 -> "a").toDF("i", "j").write.saveAsTable("t")
    +      checkAnswer(spark.table("t"), Row(1, "a"))
    +
    +      sql("INSERT INTO t SELECT 2, 'b'")
    +      checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Nil)
    +
    +      Seq(3 -> "c").toDF("i", "j").write.mode("append").saveAsTable("t")
    +      checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, "c") :: Nil)
    +
    +      Seq("c" -> 3).toDF("i", "j").write.mode("append").saveAsTable("t")
    +      checkAnswer(spark.table("t"), Row(1, "a") :: Row(2, "b") :: Row(3, "c")
    +        :: Row(null, "3") :: Nil)
    --- End diff --
    
    Thank you for pining me, @cloud-fan . +1 for the patch, LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20527#discussion_r166865065
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala ---
    @@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit
                """.stripMargin)
           }
     
    -      castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns)
    +      insert.copy(query = newQuery, partition = normalizedPartSpec)
    --- End diff --
    
    it's also ok to always copy it and the code is neater.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20527#discussion_r166853858
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo
           checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
         }
       }
    +
    +  // TODO: This test is copied from HiveDDLSuite, unify it later.
    +  test("SPARK-23348: append data to data source table with saveAsTable") {
    --- End diff --
    
    Do we also want to cover the following case:
    ```
    2) Target tables have column metadata
    ```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    **[Test build #87151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87151/testReport)** for PR 20527 at commit [`ad19125`](https://github.com/apache/spark/commit/ad19125ab54439979eb4aa2bebf2cb8c9c85551e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/659/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20527#discussion_r166852743
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala ---
    @@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit
                """.stripMargin)
           }
     
    -      castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns)
    +      insert.copy(query = newQuery, partition = normalizedPartSpec)
    --- End diff --
    
    nit: don't need to copy the `newQuery` if it is the same as `query`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    cc @gatorsmile @dongjoon-hyun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20527


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    LGTM
    
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20527#discussion_r166865177
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ---
    @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo
           checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil)
         }
       }
    +
    +  // TODO: This test is copied from HiveDDLSuite, unify it later.
    +  test("SPARK-23348: append data to data source table with saveAsTable") {
    --- End diff --
    
    maybe we can add this when unifying the test cases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20527
  
    **[Test build #87151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87151/testReport)** for PR 20527 at commit [`ad19125`](https://github.com/apache/spark/commit/ad19125ab54439979eb4aa2bebf2cb8c9c85551e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org