You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by brandonJY <gi...@git.apache.org> on 2018/01/19 05:49:15 UTC

[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

GitHub user brandonJY opened a pull request:

    https://github.com/apache/spark/pull/20325

    [SPARK-22808][DOCS] add insertInto when save hive built dataframe

    ## What changes were proposed in this pull request?
    
    based on https://issues.apache.org/jira/browse/SPARK-22808 &
    https://issues.apache.org/jira/browse/SPARK-16803, insertInto should be
    used instead of saveAsTable when dataframe is built on Hive table.
    Example code in this doc does not affect. Additional example code is not
    added in the moment, due to we may patch for the saveAsTable later.
    So just editing the doc at the moment.
    
    ## How was this patch tested?
    
    manual tested


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brandonJY/spark SPARK-22808

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20325.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20325
    
----
commit a40a8fae927eb63dabb8461f02f737d5b25ce5e6
Author: Brandon Jiang <br...@...>
Date:   2018-01-19T05:48:22Z

    [SPARK-22808][DOCS] add insertInto when save hive built dataframe
    
    based on https://issues.apache.org/jira/browse/SPARK-22808 &
    https://issues.apache.org/jira/browse/SPARK-16803, insertInto should be
    used instead of saveAsTable when dataframe is built on Hive table.
    Example code in this doc does not affect. Additional example code is not
    added in the moment, due to we may patch for the saveAsTable later.
    So just editing the doc at the moment.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

Posted by brandonJY <gi...@git.apache.org>.
Github user brandonJY commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20325#discussion_r162837134
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. Unlike the `createOrReplaceT
     Hive metastore. Persistent tables will still exist even after your Spark program has restarted, as
     long as you maintain your connection to the same metastore. A DataFrame for a persistent table can
     be created by calling the `table` method on a `SparkSession` with the name of the table.
    +Notice that for `DataFrames` is built on Hive table, `insertInto` should be used instead of `saveAsTable`.
    --- End diff --
    
    Ah, I see. In that case, I don't see the necessary to mention `insertInto` in this doc any more. I am closing this PR. Feel free to reopen it if necessary. 
    I would also suggest to mark https://issues.apache.org/jira/browse/SPARK-22808 as reolvsed and linked it to https://issues.apache.org/jira/browse/SPARK-19152


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20325: [SPARK-22808][DOCS] add insertInto when save hive built ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20325
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20325#discussion_r162760318
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. Unlike the `createOrReplaceT
     Hive metastore. Persistent tables will still exist even after your Spark program has restarted, as
     long as you maintain your connection to the same metastore. A DataFrame for a persistent table can
     be created by calling the `table` method on a `SparkSession` with the name of the table.
    +Notice that for `DataFrames` is built on Hive table, `insertInto` should be used instead of `saveAsTable`.
    --- End diff --
    
    Let us get rid of `Notice that for DataFrames is built on Hive table,`.  `insertInto` can work for any existing table. More importantly, `DataFrames` might be created from scratch. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20325#discussion_r162845525
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. Unlike the `createOrReplaceT
     Hive metastore. Persistent tables will still exist even after your Spark program has restarted, as
     long as you maintain your connection to the same metastore. A DataFrame for a persistent table can
     be created by calling the `table` method on a `SparkSession` with the name of the table.
    +Notice that for `DataFrames` is built on Hive table, `insertInto` should be used instead of `saveAsTable`.
    --- End diff --
    
    Done. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

Posted by brandonJY <gi...@git.apache.org>.
Github user brandonJY closed the pull request at:

    https://github.com/apache/spark/pull/20325


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20325#discussion_r162790779
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. Unlike the `createOrReplaceT
     Hive metastore. Persistent tables will still exist even after your Spark program has restarted, as
     long as you maintain your connection to the same metastore. A DataFrame for a persistent table can
     be created by calling the `table` method on a `SparkSession` with the name of the table.
    +Notice that for `DataFrames` is built on Hive table, `insertInto` should be used instead of `saveAsTable`.
    --- End diff --
    
    This limitation is lifted in Spark 2.2. See https://issues.apache.org/jira/browse/SPARK-19152


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

Posted by brandonJY <gi...@git.apache.org>.
Github user brandonJY commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20325#discussion_r162781167
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. Unlike the `createOrReplaceT
     Hive metastore. Persistent tables will still exist even after your Spark program has restarted, as
     long as you maintain your connection to the same metastore. A DataFrame for a persistent table can
     be created by calling the `table` method on a `SparkSession` with the name of the table.
    +Notice that for `DataFrames` is built on Hive table, `insertInto` should be used instead of `saveAsTable`.
    --- End diff --
    
    @gatorsmile Could you elaborate on your comment? The purpose of this sentence was to warn user to use `insertInto` when they are dealing DataFrames that created from Hive table. Since due to https://issues.apache.org/jira/browse/SPARK-16803, `saveAsTable` will not work on that special case. Or do you have any suggestions to make it more clear?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20325: [SPARK-22808][DOCS] add insertInto when save hive built ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20325
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org