You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by marmbrus <gi...@git.apache.org> on 2014/04/08 04:33:53 UTC

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/354

    [SQL] SPARK-1424 Generalize insertIntoTable functions on SchemaRDDs

    I don't think this is quite done yet, but want to hear what people think about this API.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark insertIntoTable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/354.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #354
    
----
commit c60d2fce123fb2caa7f1547e6b978cd5b1756193
Author: Michael Armbrust <mi...@databricks.com>
Date:   2014-04-08T02:32:37Z

    Make insertInto available on JavaSchemaRDD as well.  Add createTableAs function.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40558276
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14162/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40241700
  
    @rxin @mateiz PTAL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11554683
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +65,43 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * :: Experimental ::
    --- End diff --
    
    Yeah make sure you do this in other files too. You can check in the ScalaDoc package view whether the first line is being kept.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11375843
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +64,41 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String, overwrite: Boolean): Unit =
    +    sqlContext.executePlan(
    +      InsertIntoTable(UnresolvedRelation(None, tableName), Map.empty, logicalPlan, overwrite)).toRdd
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Appends the rows from this RDD to the specified table.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String): Unit = insertInto(tableName, false)
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Creates a table from the the contents of this SchemaRDD.  This will fail if the table already
    +   * exists.
    +   *
    +   * Note that this currently only works with SchemaRDDs that are created from a HiveContext as
    +   * there is no notion of a persisted catalog in a standard SQL context.  Instead you can write
    +   * an RDD out to a parquet file, and then register that file as a table.  This "table" can then
    +   * be the target of an `insertInto`.
    +   *
    +   * @param tableName
    +   */
    +  def createTableAs(tableName: String) =
    --- End diff --
    
    explicitly define the return type for public interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-39807993
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11548229
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +65,43 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * :: Experimental ::
    --- End diff --
    
    Ah, good to know.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40547218
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-39807995
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13875/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40558275
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11516306
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +64,41 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String, overwrite: Boolean): Unit =
    --- End diff --
    
    At some point we probably want to clarify execute returning and rdd, execute returning an array (executeCollect) and execute DDL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40440869
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11375848
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +64,41 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String, overwrite: Boolean): Unit =
    +    sqlContext.executePlan(
    +      InsertIntoTable(UnresolvedRelation(None, tableName), Map.empty, logicalPlan, overwrite)).toRdd
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Appends the rows from this RDD to the specified table.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String): Unit = insertInto(tableName, false)
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Creates a table from the the contents of this SchemaRDD.  This will fail if the table already
    +   * exists.
    +   *
    +   * Note that this currently only works with SchemaRDDs that are created from a HiveContext as
    +   * there is no notion of a persisted catalog in a standard SQL context.  Instead you can write
    +   * an RDD out to a parquet file, and then register that file as a table.  This "table" can then
    +   * be the target of an `insertInto`.
    +   *
    +   * @param tableName
    +   */
    +  def createTableAs(tableName: String) =
    --- End diff --
    
    and i think the next line doesn't need wrapping (should fit in 100 chars)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40554849
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40439899
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-39806224
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11555907
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +65,43 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * :: Experimental ::
    +   *
    +   * Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
    +   *
    +   * @group schema
    +   */
    +  @Experimental
    +  def insertInto(tableName: String, overwrite: Boolean): Unit =
    +    sqlContext.executePlan(
    +      InsertIntoTable(UnresolvedRelation(None, tableName), Map.empty, logicalPlan, overwrite)).toRdd
    +
    +  /**
    +   * :: Experimental ::
    +   *
    +   * Appends the rows from this RDD to the specified table.
    +   *
    +   * @group schema
    +   */
    +  @Experimental
    +  def insertInto(tableName: String): Unit = insertInto(tableName, false)
    +
    +  /**
    +   * :: Experimental ::
    +   *
    +   * Creates a table from the the contents of this SchemaRDD.  This will fail if the table already
    +   * exists.
    +   *
    +   * Note that this currently only works with SchemaRDDs that are created from a HiveContext as
    +   * there is no notion of a persisted catalog in a standard SQL context.  Instead you can write
    +   * an RDD out to a parquet file, and then register that file as a table.  This "table" can then
    +   * be the target of an `insertInto`.
    +   *
    +   * @param tableName
    +   */
    +  @Experimental
    +  def createTableAs(tableName: String): Unit =
    --- End diff --
    
    Maybe this should be called saveAsTable, to be similar to the Parquet files. Actually in saveAsParquetFile you could also add an optional parameter to register the file as a table, and then you'd have no need for createParquetFile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11375853
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +64,41 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String, overwrite: Boolean): Unit =
    --- End diff --
    
    is this supposed to be "Unit" or RDD? (end of the function does a toRdd)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40439248
  
    Okay, I updated the API based on a conversation with @mateiz.  I also added the relevant function to the Java API.  We can do python in a follow up PR once that is merged.  Once Jenkins passes I think this is ready to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40547232
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40437968
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40551904
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14158/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40263207
  
    So yeah BTW my high-level comment is if we can get rid of createPaquetFile and just add a "table name" argument to saveAsParquetFile. Then we should also rename createTableAs to saveAsTable.
    
    By the way, longer-term, we might generalize saveAsTable to take some kind of URI for a storage location, so that the same API can be used to store stuff in Hive, Cassandra, Parquet, etc. But I don't see that clashing with the current API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40440870
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14129/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40439900
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14127/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11597602
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -88,6 +96,26 @@ class SQLContext(@transient val sparkContext: SparkContext)
       def parquetFile(path: String): SchemaRDD =
         new SchemaRDD(this, parquet.ParquetRelation(path))
     
    +  /**
    +   * :: Experimental ::
    +   *
    +   * Creates an empty parquet file with the schema of class `A`, which can be registered as a table.
    +   * This registered table can be used as the target of future `insertInto` operations.
    --- End diff --
    
    It'd be better to have some inlien scaladoc example to illustrate how this can be used. e.g. have a line that creates the parquet file, followed by one or two insert into sql statements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11516290
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +64,41 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * <span class="badge badge-red" style="float: right;">EXPERIMENTAL</span>
    +   *
    +   * Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
    +   *
    +   * @group schema
    +   */
    +  def insertInto(tableName: String, overwrite: Boolean): Unit =
    --- End diff --
    
    Yeah, the toRdd is an internal hack to make sure the query runs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40437974
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40262979
  
    Once the Scala ones are finalized, Michael, are you going to add these to the Java API in the same PR? We should make sure the API translates well into Java (e.g. for default arguments we create overloaded methods there, it shouldn't be too bad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40439169
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/354


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40166320
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40166326
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40439180
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40551902
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40167722
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40782397
  
    @marmbrus do we need to add these new methods to the Python API too now? It sounds like you just need to forward through to the methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40554839
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40831481
  
    @mateiz, yes we do.  @ahirreddy was looking into that (but if you are too busy with other things, Ahir, let me know and I'll take care of it).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40167725
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14039/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40166368
  
    Okay, you can now create both hive tables and parquet files using a schema defined by a case class with or without an error if the table exists.  You can insert (append or overwrite) any SchemaRDD to either type of table using SQL, HQL or the DSL.
    
    When this looks good I think we should merge and then add python / java in a followup PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/354#discussion_r11547314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala ---
    @@ -62,4 +65,43 @@ trait SchemaRDDLike {
       def registerAsTable(tableName: String): Unit = {
         sqlContext.registerRDDAsTable(baseSchemaRDD, tableName)
       }
    +
    +  /**
    +   * :: Experimental ::
    --- End diff --
    
    You may have to put the first line of real comments on the line directly after this tag, in order for it to show up in the Javadoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-39806215
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/354#issuecomment-40559997
  
    Thanks. Merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---