You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JustinPihony <gi...@git.apache.org> on 2016/04/22 08:02:00 UTC

[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

GitHub user JustinPihony opened a pull request:

    https://github.com/apache/spark/pull/12601

    [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc

    ## What changes were proposed in this pull request?
    
    This change modifies the implementation of DataFrameWriter.save such that it works with jdbc, and the call to jdbc merely delegates to save. 
    
    ## How was this patch tested?
    
    This was tested via unit tests in the JDBCWriteSuite, of which I added one new test to cover this scenario.
    
    ## Additional details
    
    @rxin This seems to have been most recently touched by you and was also commented on in the JIRA. 
    
    This contribution is my original work and I license the work to the project under the project's open source license.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JustinPihony/spark jdbc_reconciliation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12601
    
----
commit 6be5046119f1f9cffb78a08a9dbe614ed4f9d460
Author: Justin Pihony <ju...@gmail.com>
Date:   2016-04-22T05:55:34Z

    [SPARK-14525][SQL] Make DataFrameWrite.save work for jdbc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77949090
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    +    val partitionInfo = if (partitionColumn == null) { 
    +      null 
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new TableAlreadyExistsException(
    +          s"Table $table already exists, and SaveMode is set to ErrorIfExists.")
    --- End diff --
    
    No, [it is required...](https://github.com/apache/spark/blob/5c6b0855787c080d3e233eb09c05c025395e7cb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala#L30) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Thank you for adding more test cases! It looks many duplicate codes. Could you just write a common function and each test case just call this function, instead of copying all the codes? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r67539731
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
    +    val url = parameters.getOrElse("url", sys.error("Option 'url' not specified"))
    +    val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified"))
    +    val partitionColumn = parameters.getOrElse("partitionColumn", null)
    +    val lowerBound = parameters.getOrElse("lowerBound", null)
    +    val upperBound = parameters.getOrElse("upperBound", null)
    +    val numPartitions = parameters.getOrElse("numPartitions", null)
    --- End diff --
    
    @HyukjinKwon Thanks, I did not know about this. Before I push code I was curious why JDBCOptions does not include the partitioning validation? That seems like a point of duplication also.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65891/consoleFull)** for PR 12601 at commit [`724bbe2`](https://github.com/apache/spark/commit/724bbe22b23050f3bdbf6d1bf14d4dabc52113b2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213298866
  
    I think `Additional details` could be said in comments not in the PR description because PR description describes what the PR is. Maybe `Additional details` is not related with the PR itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @HyukjinKwon @gatorsmile @srowen Can one of you review my documentation change so we can get this pushed out :)?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77061367
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,100 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val url = parameters.getOrElse("url",
    +      sys.error("Saving jdbc source requires url to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")"))
    +    val table = parameters.getOrElse("dbtable", parameters.getOrElse("table",
    +      sys.error("Saving jdbc source requires dbtable to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => sys.error(s"Table $table already exists.")
    +        case (SaveMode.Overwrite, true) =>
    +          JdbcUtils.dropTable(conn, table)
    --- End diff --
    
    If you want I can make the change. I was trying to not change behavior as this is [the current behavior](https://github.com/apache/spark/blob/29952ed096fd2a0a19079933ff691671d6f00835/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L448).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80628570
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    +    val partitionInfo = if (partitionColumn == null) {
           null
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    --- End diff --
    
    If the table does not exist and the mode is `OVERWRITE`, we create a table, then insert rows into the table, and finally return a BaseRelation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948196
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    --- End diff --
    
    Yes, but the guidelines do not specify this scenario. It is not returning a unit, but a value and looks ridiculous in comparison. I have made the change to fit your needs  and "_speed_" up, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65087/consoleFull)** for PR 12601 at commit [`c686b0e`](https://github.com/apache/spark/commit/c686b0e694406771e24cc3a0f93a3327eac3123e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946715
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,113 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    --- End diff --
    
    Ok. I am fine, if the other are ok about it. Let me review your version. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen @cloud-fan @rxin Could you trigger the test? The current version looks good to me. We need to check whether all the test cases can pass.
    
    Thanks for your work, @JustinPihony !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Bump @HyukjinKwon I have some comments to your comments. Could you please review them and I can push my changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77238783
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    require(parameters.isDefinedAt("url"), "Saving jdbc source requires 'url' to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")")
    +    require(parameters.isDefinedAt("dbtable"), "Saving jdbc source requires 'dbtable' to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")
    +    val url = parameters("url")
    +    val table = parameters("dbtable")
    --- End diff --
    
    @HyukjinKwon I thought about that, but this code ends up not needing the extra checks. So, it seemed unnecessary at this point. I can go either way on this one, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r66716158
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
    +    val url = parameters.getOrElse("url", sys.error("Option 'url' not specified"))
    +    val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified"))
    +    val partitionColumn = parameters.getOrElse("partitionColumn", null)
    +    val lowerBound = parameters.getOrElse("lowerBound", null)
    +    val upperBound = parameters.getOrElse("upperBound", null)
    +    val numPartitions = parameters.getOrElse("numPartitions", null)
    --- End diff --
    
    There is a class for those options, `JDBCOptions`. It would be nicer if those options are managed in a single place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r66716161
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
    +    val url = parameters.getOrElse("url", sys.error("Option 'url' not specified"))
    +    val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified"))
    +    val partitionColumn = parameters.getOrElse("partitionColumn", null)
    +    val lowerBound = parameters.getOrElse("lowerBound", null)
    +    val upperBound = parameters.getOrElse("upperBound", null)
    +    val numPartitions = parameters.getOrElse("numPartitions", null)
    +
    +    if (partitionColumn != null
    +      && (lowerBound == null || upperBound == null || numPartitions == null)) {
           sys.error("Partitioning incompletely specified")
         }
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(url, table, parts, properties, Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val url = parameters.getOrElse("url",
    +      sys.error("Saving jdbc source requires url to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")"))
    +    val table = parameters.getOrElse("dbtable", parameters.getOrElse("table",
    +      sys.error("Saving jdbc source requires dbtable to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
    +
    +    import collection.JavaConverters._
    --- End diff --
    
    I think this just can be imported at the class level rather than trying to import this for every time it creates a relation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-214992526
  
    Even if so, I think we need a kinda wrapper or something to safely convert `Properties` to `Map<String, String>` because newbie users could easily & wrongly put non-string values in `Properties`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @HyukjinKwon @gatorsmile Could you please review the documentation that I added so that we can put this to bed  :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Can you simplify the PR based on our previous discussion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946237
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,75 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.BASICCREATETEST"))
    +    .save
    +
    +    assert(2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).count)
    +    assert(
    +      2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).collect()(0).length)
    +  }
    +
    +  test("save API with SaveMode.Overwrite") {
    +    import scala.collection.JavaConverters._
    +
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +    val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2)
    +
    +    df.write.format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    --- End diff --
    
    Could you update the table names in all the test cases? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213463779
  
    @HyukjinKwon You will notice that I opted to not deprecate jdbc as I don't think that would be the correct path anyway (unless all format methods were to be deprecated). I'm not sure what gave the impression that I wanted to deprecate, but I think multiple methods to accomplish the same goal is perfectly fine.  This change merely alters the underlying implementation so that both methods work, `write.format("jdbc")` and `write.jdbc`. So, I think that this realistically addresses all of your other comments surrounding this concern of deprecation.
    
    As to additional details, I was simply [following the contributer directions.](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest) I only put it under additional details because that was what it was to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r76512092
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,100 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val url = parameters.getOrElse("url",
    +      sys.error("Saving jdbc source requires url to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")"))
    +    val table = parameters.getOrElse("dbtable", parameters.getOrElse("table",
    +      sys.error("Saving jdbc source requires dbtable to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => sys.error(s"Table $table already exists.")
    +        case (SaveMode.Overwrite, true) =>
    +          JdbcUtils.dropTable(conn, table)
    --- End diff --
    
    Hm, I don't think this is the semantics we have elsewhere. It's truncated unless truncating won't work (i.e. different schema)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @gatorsmile I addressed the rollback in [this comment](https://github.com/apache/spark/pull/12601#issuecomment-233076856) and haven't heard back one way or the other. @srowen ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213297906
  
    @JustinPihony I think we haven't reached the conclusion yet and haven't got any feedback, from committers, if we should deprecate `read.jdbc()` or support `write.format(jdbc)` in [SPARK-14525](https://issues.apache.org/jira/browse/SPARK-14525).
    
    Usually, we should discuss the proper ways to fix and problems first in JIRA and open a PR. 
    
    As I said in the JIRA, if we go for deprecating `read.jdbc()`, we might need to close this JIRA and create another one. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65275/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77945298
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,104 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    --- End diff --
    
    Can we maybe move this up too if either is okay?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946277
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,75 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.BASICCREATETEST"))
    +    .save
    +
    +    assert(2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).count)
    +    assert(
    +      2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).collect()(0).length)
    +  }
    +
    +  test("save API with SaveMode.Overwrite") {
    +    import scala.collection.JavaConverters._
    +
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +    val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2)
    +
    +    df.write.format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    +      .options(properties.asScala)
    +      .save()
    +    df2.write.mode(SaveMode.Overwrite).format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    +      .options(properties.asScala)
    +      .save()
    +    assert(1 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count())
    +    assert(2 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).collect()(0).length)
    +  }
    +
    +  test("save errors if url is not specified") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[RuntimeException] {
    +      df.write.format("jdbc")
    +        .option("dbtable", "TEST.TRUNCATETEST")
    +        .options(properties.asScala)
    +        .save()
    +    }.getMessage
    +    assert(e.contains("Option 'url' is required"))
    +  }
    +
    +  test("save errors if dbtable is not specified") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[RuntimeException] {
    +      df.write.format("jdbc")
    +        .option("url", url1)
    +        .options(properties.asScala)
    +        .save()
    +    }.getMessage
    +    assert(e.contains("Option 'dbtable' is required"))
    +  }
    +
    +  test("save errors if wrong user/password combination") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[org.h2.jdbc.JdbcSQLException] {
    --- End diff --
    
    `var` -> `val`. The same issues in the above test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65455 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65455/consoleFull)** for PR 12601 at commit [`4a02c82`](https://github.com/apache/spark/commit/4a02c8249ff224f4cbae7050e99f77d31601f4af).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen I have addressed all of your comments. The only changes made were to remove `sys.error` and use `require` where appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77058751
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -396,49 +396,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
        * @since 1.4.0
        */
       def jdbc(url: String, table: String, connectionProperties: Properties): Unit = {
    +    import scala.collection.JavaConverters._
    --- End diff --
    
    I opted to only import them here because it is the only place they are required, so there is no need to drag in the import to the whole class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213662908
  
    @HyukjinKwon I just posted on the JIRA the background of `Properties` and how reasonable it is to assume it can be converted to a `String`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948300
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    --- End diff --
    
    Not used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946806
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    import collection.JavaConverters._
    +    
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new SQLException(
    --- End diff --
    
    Please throw the exception `TableAlreadyExistsException`. The target could be a `View`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-216445374
  
    @rxin I also realised Python API is supporting properties as a dict having " arbitrary string tag/value", [here](https://github.com/apache/spark/blob/d37c7f7f042f7943b5b684e53cf4284c601fb347/python/pyspark/sql/readwriter.py#L847-L849). Could we go ahead with this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65454/consoleFull)** for PR 12601 at commit [`447ab82`](https://github.com/apache/spark/commit/447ab821deeb0500a78080c7d7597705664e3240).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80628287
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    +    val partitionInfo = if (partitionColumn == null) {
           null
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    --- End diff --
    
    Now, at least, three of reviewers are confused of this bit. Do you mind if I submit a PR to clean up this part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77060790
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,100 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    --- End diff --
    
    No, this is to meet the requirements of trait `RelationProvider`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @gatorsmile I did just review it and still prefer mine...a simpler PR does not necessarily mean it is more correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r76512050
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,100 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    --- End diff --
    
    Is this a separate method instead of using an optional arg to try to retain binary compatibility?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65860/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen I had to fix something on my local machine to get proper test results, but this should be good to go now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65275/consoleFull)** for PR 12601 at commit [`7ef7a48`](https://github.com/apache/spark/commit/7ef7a489b27fa6bd5d79ee4d428874162fd813de).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    I don't see that any docs were added? I mean in the user facing documentation under docs/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65460/consoleFull)** for PR 12601 at commit [`a238156`](https://github.com/apache/spark/commit/a2381567ebdf9ea8c15824da8e2c57561a6f4a66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213382156
  
    I think I can rework based on this because it is anyway opened already. Excuse my ping @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946660
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,75 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.BASICCREATETEST"))
    +    .save
    +
    +    assert(2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).count)
    +    assert(
    +      2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).collect()(0).length)
    +  }
    +
    +  test("save API with SaveMode.Overwrite") {
    +    import scala.collection.JavaConverters._
    +
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +    val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2)
    +
    +    df.write.format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    +      .options(properties.asScala)
    +      .save()
    +    df2.write.mode(SaveMode.Overwrite).format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    +      .options(properties.asScala)
    +      .save()
    +    assert(1 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count())
    +    assert(2 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).collect()(0).length)
    +  }
    +
    +  test("save errors if url is not specified") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[RuntimeException] {
    +      df.write.format("jdbc")
    +        .option("dbtable", "TEST.TRUNCATETEST")
    +        .options(properties.asScala)
    +        .save()
    +    }.getMessage
    +    assert(e.contains("Option 'url' is required"))
    +  }
    +
    +  test("save errors if dbtable is not specified") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[RuntimeException] {
    +      df.write.format("jdbc")
    +        .option("url", url1)
    +        .options(properties.asScala)
    +        .save()
    +    }.getMessage
    +    assert(e.contains("Option 'dbtable' is required"))
    +  }
    +
    +  test("save errors if wrong user/password combination") {
    +    import scala.collection.JavaConverters._
    --- End diff --
    
    How about putting this import in the top with other imports if either is okay? I understand your point putting this inside but let's just follow the majority of other codes just to speed up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    +1 for triggering the test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65452/consoleFull)** for PR 12601 at commit [`c9dcdc4`](https://github.com/apache/spark/commit/c9dcdc4058c47a82c3dfa99b6e10aa034981c700).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213397222
  
    Question: I found a [PPT](http://www.slideshare.net/SparkSummit/structuring-spark-dataframes-datasets-and-streaming-by-michael-armbrust) which I think is used in Spark Summit by @marmbrus. In 30 page, `write.format("jdbc")` is used as a example. Is there any way.to support this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77944399
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,113 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    --- End diff --
    
    Initially, I mean to correct this as @gatorsmile did in [here](https://github.com/gatorsmile/spark/blob/07e316823ed17e89c3df0aaccf3fbb958afcfe3a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L390-L423). I am not saying this is wrong or inappropriate but just personally I'd prefer this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r76512084
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ---
    @@ -36,4 +36,9 @@ private[jdbc] class JDBCOptions(
       val upperBound = parameters.getOrElse("upperBound", null)
    --- End diff --
    
    I see some of the existing code has a `getOrElse(..., sys.error...)`. I think we should switch to either
    
    ```
    require(foo.isDefined, ...)
    ... foo.get(...)
    ```
    or
    ```
    foo.getOrElse(..., throw new IllegalArgumentException(...))
    ```
    One problem is that `sys.error` just generates a bald `RuntimeException`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65460/consoleFull)** for PR 12601 at commit [`a238156`](https://github.com/apache/spark/commit/a2381567ebdf9ea8c15824da8e2c57561a6f4a66).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r69267347
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala ---
    @@ -96,7 +97,16 @@ private[sql] case class JDBCRelation(
     
       override val needConversion: Boolean = false
     
    -  override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
    +  override val schema: StructType = {
    +    val resolvedSchema = JDBCRDD.resolveTable(url, table, properties)
    +    providedSchemaOption match {
    +      case Some(providedSchema) =>
    +        if (providedSchema.sql.toLowerCase == resolvedSchema.sql.toLowerCase) resolvedSchema
    --- End diff --
    
    I think `JDBCRDD.resolveTable` needs another query execution. Although it would be less expensive than inferring schemas in CSV or JSON, it would be still a bit of overhead. I am not 100% about this too. So, I think it might be better to be consistent with the others in this case.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @gatorsmile wrt tests, I had actually had these but stashed them to make a merge easier and forgot to re-add them. I have since pushed them :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @JustinPihony  Sorry, I did not realize you submitted a PR for the same issue. Could you please review my PR? https://github.com/apache/spark/pull/14077 I think my solution might be cleaner and simpler. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Sure, will build the document in my local computer and review it soon. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946422
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    import collection.JavaConverters._
    --- End diff --
    
    I meant putting it up with other imports. Also, I remember this should be `scala.collection.JavaConverters`. scalastyle will check this IIRC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80350458
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -23,6 +23,8 @@
     import java.util.List;
     // $example off:schema_merging$
     
    +import java.util.Properties;
    +
    --- End diff --
    
    I think we should put `java.util` imports together without additional newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Mostly LGTM, except three minor comments. 
    
    Thank you for your hard work, @JustinPihony !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-220431230
  
    I just updated the branch to have no conflicts. Again, either the code looks good to merge, or I can make JDBC a `CreatableRelationProvider` (but that comes with additional baggage as already discussed...might be better in a separate change if at all?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65860/consoleFull)** for PR 12601 at commit [`8fb86b4`](https://github.com/apache/spark/commit/8fb86b482929e321f4ec8865124b8661f1a29bbf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-217359217
  
    I can finish this next Monday(fix the conflicts that now exist), and will actually do that given the above comments. I'd still like to get an opinion on whether I should change the code to be a `CreatableRelationProvider` or not? I like the idea, but believe that there is much more room for error with that aspect. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @JustinPihony Based on my understanding, we need to minimize the code changes at our best. Any extra code changes might introduce unnecessary bugs. 
    
    At the same time, we need to improve the test coverage at our best. Please check my PR https://github.com/apache/spark/pull/14077 and add the test cases here, if possible.
    
    In the future, if we want to support user-defined schema, the work in this PR is not enough. Again, the test coverage is very very important when we intially add the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    By extending the trait `CreatableRelationProvider`, we can support the `save` API now. The actual code changes are very small in the PR: https://github.com/apache/spark/pull/14077


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77949211
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
    --- End diff --
    
    It depends on if the table exception is used or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77947902
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,113 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    --- End diff --
    
    I did add a comment in the method signature. That and the variable naming conventions should cover this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen Documentation added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948082
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    --- End diff --
    
    uh, we do not have a test case to cover that. Since you made a change, could you add such a test case? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r66009900
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala ---
    @@ -96,7 +97,16 @@ private[sql] case class JDBCRelation(
     
       override val needConversion: Boolean = false
     
    -  override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
    +  override val schema: StructType = {
    +    val resolvedSchema = JDBCRDD.resolveTable(url, table, properties)
    +    providedSchemaOption match {
    +      case Some(providedSchema) =>
    +        if (providedSchema.sql.toLowerCase == resolvedSchema.sql.toLowerCase) resolvedSchema
    --- End diff --
    
    This is the only area I'm unsure about. I'd like a second opinion on whether this seems ok, or if I need to build something more custom for schema comparison.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65454/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80350755
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -23,6 +23,8 @@
     import java.util.List;
     // $example off:schema_merging$
     
    +import java.util.Properties;
    +
    --- End diff --
    
    Should this really be added to the example, though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213568166
  
    A possible problem was noticed (is `Properties` guaranteed to be converted to `String`?) in the JIRA before this PR and any evidence was not prodivded or said before opening a PR, which I did instead in this PR.
    
    Also, I think it is a minor but still it looks not sensible that `Additional details` does not look related with the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77945537
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,104 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    --- End diff --
    
    Please correct the style here. See https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946637
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    --- End diff --
    
    I had put that when adding the brackets, but it actually hurts the code flow in this case. And there is nothing in the style guide saying otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65891/consoleFull)** for PR 12601 at commit [`724bbe2`](https://github.com/apache/spark/commit/724bbe22b23050f3bdbf6d1bf14d4dabc52113b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65452/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen The doc changes have been reviewed, so this should be good to go


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen I have addressed your concerns and also merged the branch to handle the past few changes to jdbc. I think this should be good to go now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @JustinPihony et al, this looks OK to me too given the reviews, but do we need a doc update too? now `write("jdbc")` works where it didn't before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77947343
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,113 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    --- End diff --
    
    Then would it make sense if we add some comments for each case? In a quick look, it seems really confusing what each case means to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65858/consoleFull)** for PR 12601 at commit [`06c1cba`](https://github.com/apache/spark/commit/06c1cba1da5ab140d71c29f41afd608e863bfe1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    It sounds like you still keep `SchemaRelationProvider`? What is the benefit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r67582844
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
    +    val url = parameters.getOrElse("url", sys.error("Option 'url' not specified"))
    +    val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified"))
    +    val partitionColumn = parameters.getOrElse("partitionColumn", null)
    +    val lowerBound = parameters.getOrElse("lowerBound", null)
    +    val upperBound = parameters.getOrElse("upperBound", null)
    +    val numPartitions = parameters.getOrElse("numPartitions", null)
    +
    +    if (partitionColumn != null
    +      && (lowerBound == null || upperBound == null || numPartitions == null)) {
           sys.error("Partitioning incompletely specified")
         }
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(url, table, parts, properties, Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val url = parameters.getOrElse("url",
    +      sys.error("Saving jdbc source requires url to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")"))
    +    val table = parameters.getOrElse("dbtable", parameters.getOrElse("table",
    +      sys.error("Saving jdbc source requires dbtable to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => sys.error(s"Table $table already exists.")
    +        case (SaveMode.Overwrite, true) =>
    +          JdbcUtils.dropTable(conn, table)
    +          (true, true)
    +        case (SaveMode.Append, true) => (false, true)
    +        case (_, true) => sys.error(s"Unexpected SaveMode, '$mode', for handling existing tables.")
    +        case (_, false) => (true, true)
    --- End diff --
    
    I'm not 100% sure I get what you mean, as the booleans map directly to a variable. The only thing I can think of beyond using a var and setting the variables directly (ugly) is to create a `JDBCActionDecider` to wrap the values and return that, but that ultimately adds another level of indirection that seems unnecessary in this case.
    And I do not think that comments should be necessary, but there is also the added benefit of having this broken down in the method comment. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65452/consoleFull)** for PR 12601 at commit [`c9dcdc4`](https://github.com/apache/spark/commit/c9dcdc4058c47a82c3dfa99b6e10aa034981c700).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Can this be merged now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213382065
  
    BTW, it looks it is pretty general that `Properties` just works like `HashMap[String, String]` in most cases.
    
    Firstly, I just checked [java.sql.Driver API](https://docs.oracle.com/javase/7/docs/api/java/sql/Driver.html) and it describes the argument for `Properties` as below:
    
    >info - a list of arbitrary string tag/value pairs as connection arguments. Normally at least a "user" and "password" property should be included.
    
    Secondly, apparently Spark uses the methods below in `Properties`. 
    
    ```java
    public Set<String> stringPropertyNames()
    public String getProperty(String key, String defaultValue)
    public void store(OutputStream out, String comments)  // This uses convert for keys and values internally.
    public synchronized Object setProperty(String key, String value)
    ```
    
    It looks they uses `String` for keys and values.
    
    
    So, I think it might be OK to support `write.format("jdbc")`. I believe `read.format("jdbc")` is already being supported and I could not find JIRA issues about the problem for giving some options for `read.format("jdbc")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r66716162
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
    +    val url = parameters.getOrElse("url", sys.error("Option 'url' not specified"))
    +    val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified"))
    +    val partitionColumn = parameters.getOrElse("partitionColumn", null)
    +    val lowerBound = parameters.getOrElse("lowerBound", null)
    +    val upperBound = parameters.getOrElse("upperBound", null)
    +    val numPartitions = parameters.getOrElse("numPartitions", null)
    +
    +    if (partitionColumn != null
    +      && (lowerBound == null || upperBound == null || numPartitions == null)) {
           sys.error("Partitioning incompletely specified")
         }
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(url, table, parts, properties, Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val url = parameters.getOrElse("url",
    +      sys.error("Saving jdbc source requires url to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")"))
    +    val table = parameters.getOrElse("dbtable", parameters.getOrElse("table",
    +      sys.error("Saving jdbc source requires dbtable to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => sys.error(s"Table $table already exists.")
    +        case (SaveMode.Overwrite, true) =>
    +          JdbcUtils.dropTable(conn, table)
    +          (true, true)
    +        case (SaveMode.Append, true) => (false, true)
    +        case (_, true) => sys.error(s"Unexpected SaveMode, '$mode', for handling existing tables.")
    +        case (_, false) => (true, true)
    --- End diff --
    
    Personally, I think this combinations of booleans are confusing. It might be better if they have some variables so that we can understand what each `case` means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65454/consoleFull)** for PR 12601 at commit [`447ab82`](https://github.com/apache/spark/commit/447ab821deeb0500a78080c7d7597705664e3240).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946299
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,113 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    --- End diff --
    
    Your way results in the need for a `return`, which can lead to problems and is [generally discouraged](https://tpolecat.github.io/2014/05/09/return.html). In the current implementation you could just have it do nothing and the next if block will be skipped anyway, but that leaves a lot of room for error in further code changes. Whereas this way is very explicit about the rules and what each combination will yield. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946434
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    --- End diff --
    
    It was moved into the JDBCOptions as had been previously discussed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80350919
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -23,6 +23,8 @@
     import java.util.List;
     // $example off:schema_merging$
     
    +import java.util.Properties;
    +
    --- End diff --
    
    No reason to not follow the guildline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80353253
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -420,62 +420,11 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
       def jdbc(url: String, table: String, connectionProperties: Properties): Unit = {
         assertNotPartitioned("jdbc")
         assertNotBucketed("jdbc")
    -
    -    // to add required options like URL and dbtable
    -    val params = extraOptions.toMap ++ Map("url" -> url, "dbtable" -> table)
    -    val jdbcOptions = new JDBCOptions(params)
    -    val jdbcUrl = jdbcOptions.url
    -    val jdbcTable = jdbcOptions.table
    -
    -    val props = new Properties()
    -    extraOptions.foreach { case (key, value) =>
    -      props.put(key, value)
    -    }
         // connectionProperties should override settings in extraOptions
    -    props.putAll(connectionProperties)
    -    val conn = JdbcUtils.createConnectionFactory(jdbcUrl, props)()
    -
    -    try {
    -      var tableExists = JdbcUtils.tableExists(conn, jdbcUrl, jdbcTable)
    -
    -      if (mode == SaveMode.Ignore && tableExists) {
    -        return
    -      }
    -
    -      if (mode == SaveMode.ErrorIfExists && tableExists) {
    -        sys.error(s"Table $jdbcTable already exists.")
    -      }
    -
    -      if (mode == SaveMode.Overwrite && tableExists) {
    -        if (jdbcOptions.isTruncate &&
    -            JdbcUtils.isCascadingTruncateTable(jdbcUrl) == Some(false)) {
    -          JdbcUtils.truncateTable(conn, jdbcTable)
    -        } else {
    -          JdbcUtils.dropTable(conn, jdbcTable)
    -          tableExists = false
    -        }
    -      }
    -
    -      // Create the table if the table didn't exist.
    -      if (!tableExists) {
    -        val schema = JdbcUtils.schemaString(df, jdbcUrl)
    -        // To allow certain options to append when create a new table, which can be
    -        // table_options or partition_options.
    -        // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8"
    -        val createtblOptions = jdbcOptions.createTableOptions
    -        val sql = s"CREATE TABLE $jdbcTable ($schema) $createtblOptions"
    -        val statement = conn.createStatement
    -        try {
    -          statement.executeUpdate(sql)
    -        } finally {
    -          statement.close()
    -        }
    -      }
    -    } finally {
    -      conn.close()
    -    }
    -
    -    JdbcUtils.saveTable(df, jdbcUrl, jdbcTable, props)
    +    this.extraOptions = this.extraOptions ++ (connectionProperties.asScala)
    +    // explicit url and dbtable should override all
    +    this.extraOptions += ("url" -> url, "dbtable" -> table)
    +    format("jdbc").save
    --- End diff --
    
    The omission of parentheses on methods should only be used when the method has no side-effects. 
    
    Thus, please change it to `save()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Bump :) Anybody able to review this one for me please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    You know, https://github.com/apache/spark/pull/14075 has been merged. It is unnecessary to extend `SchemaRelationProvider` in your PR, right? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65455/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r76512038
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -396,49 +396,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
        * @since 1.4.0
        */
       def jdbc(url: String, table: String, connectionProperties: Properties): Unit = {
    +    import scala.collection.JavaConverters._
    --- End diff --
    
    Nit: I'd import these with other imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77067046
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ---
    @@ -36,4 +38,9 @@ private[jdbc] class JDBCOptions(
       val upperBound = parameters.getOrElse("upperBound", null)
       // the number of partitions
       val numPartitions = parameters.getOrElse("numPartitions", null)
    +
    +  require(partitionColumn == null || 
    --- End diff --
    
    Please double check my boolean conversion, as the prior if had to be flipped for the require.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77949176
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    --- End diff --
    
    Correct, this was left from SchemaRelationProvider


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @gatorsmile I added the R and SQL documentation. I took the SQL portion from https://github.com/apache/spark/pull/6121/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946944
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    import collection.JavaConverters._
    +    
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new SQLException(
    +          s"Table $table already exists, and SaveMode is set to ErrorIfExists.")
    +        case (SaveMode.Overwrite, true) =>
    +          if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
    +            JdbcUtils.truncateTable(conn, table)
    +            (false, true)
    +          } else {
    +            JdbcUtils.dropTable(conn, table)
    +            (true, true)
    +          }
    +        case (SaveMode.Append, true) => (false, true)
    +        case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," +
    +          " for handling existing tables.")
    +        case (_, false) => (true, true)
    +      }
    +
    +      if(doCreate) {
    --- End diff --
    
    add a space after `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948654
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    +    val partitionInfo = if (partitionColumn == null) { 
    +      null 
    --- End diff --
    
    It seems mistakenly white spaces were added here. scalastyle will complain about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65860/consoleFull)** for PR 12601 at commit [`8fb86b4`](https://github.com/apache/spark/commit/8fb86b482929e321f4ec8865124b8661f1a29bbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen Ping. I don't think there is anything on my plate. This should be mergeable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    This seems good now. Could I please get this to at least run through the automatic tests so we can push this through finally?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80352317
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -21,6 +21,7 @@
     import java.util.ArrayList;
     import java.util.Arrays;
     import java.util.List;
    +import java.util.Properties;
     // $example off:schema_merging$
     
    --- End diff --
    
    Oh, maybe, my previous comment was not clear. I meant
    
    ```java
    Import java.util.List;
    // $example off:schema_merging$
    Import java.util.Properties;
    ```
    
    I haven't tried to build the doc against the current state but I guess we won't need this import for Parquet`s schema mering example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80404639
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +210,84 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.SAVETEST"))
    +    .save
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80404577
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1096,13 +1096,17 @@ the Data Sources API. The following options are supported:
     
     {% highlight sql %}
     
    -CREATE TEMPORARY VIEW jdbcTable
    +CREATE TEMPORARY TABLE jdbcTable
    --- End diff --
    
    Done, thanks. I had been going off of the tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65891/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80352586
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ---
    @@ -21,6 +21,7 @@
     import java.util.ArrayList;
     import java.util.Arrays;
     import java.util.List;
    +import java.util.Properties;
     // $example off:schema_merging$
     
    --- End diff --
    
    @HyukjinKwon Yes, that is what I was talking about...just fixed it back


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80627940
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    +    val partitionInfo = if (partitionColumn == null) {
           null
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    --- End diff --
    
    what does this table mean? what is `CreateTable, saveTable, BaseRelation`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77060202
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ---
    @@ -36,4 +36,9 @@ private[jdbc] class JDBCOptions(
       val upperBound = parameters.getOrElse("upperBound", null)
       // the number of partitions
       val numPartitions = parameters.getOrElse("numPartitions", null)
    +
    +  if (partitionColumn != null
    +      && (lowerBound == null || upperBound == null || numPartitions == null)) {
    +      sys.error("Partitioning incompletely specified")
    --- End diff --
    
    Agreed, as already stated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946854
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    --- End diff --
    
    Let's follow this just to speed up and follow the majority of the other codes.
    There is a correct example in the guide lines as blow:
    
    ```scala
    // Correct:
    if (true) {
      println("Wow!")
    }
    ```
    
    not
    
    ```scala
    // Correct:
    if (true) { println("Wow!") }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77949563
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -82,7 +81,7 @@ class JdbcRelationProvider extends CreatableRelationProvider
     
           val (doCreate, doSave) = (mode, tableExists) match {
             case (SaveMode.Ignore, true) => (false, false)
    -        case (SaveMode.ErrorIfExists, true) => throw new TableAlreadyExistsException(
    +        case (SaveMode.ErrorIfExists, true) => throw new SQLException(
               s"Table $table already exists, and SaveMode is set to ErrorIfExists.")
    --- End diff --
    
    Normally, we issue an `AnalysisException`. `Table or view $table already exists`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r69267676
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    -    val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
    +    val url = parameters.getOrElse("url", sys.error("Option 'url' not specified"))
    +    val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified"))
    +    val partitionColumn = parameters.getOrElse("partitionColumn", null)
    +    val lowerBound = parameters.getOrElse("lowerBound", null)
    +    val upperBound = parameters.getOrElse("upperBound", null)
    +    val numPartitions = parameters.getOrElse("numPartitions", null)
    --- End diff --
    
    I think the validation can be done together in `JDBCOptions`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77950064
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ---
    @@ -44,6 +46,11 @@ class JDBCOptions(
       // the number of partitions
       val numPartitions = parameters.getOrElse("numPartitions", null)
     
    +  require(partitionColumn == null ||
    +    (partitionColumn != null && lowerBound != null && upperBound != null && numPartitions != null),
    --- End diff --
    
    You can simplify it by
    ```Scala
    partitionColumn == null || (lowerBound != null && upperBound != null && numPartitions != null)
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77949242
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    --- End diff --
    
    Added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948104
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    import collection.JavaConverters._
    +    
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new SQLException(
    --- End diff --
    
    I didn't know that exception had been created in the Spark code. Any suggestions for an easy way to pull out the db, though?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12601#issuecomment-213274569
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    It sounds like you understand my points. Then, please update your PR based on my latest changes and we can review your PR. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Yes I realize that `SchemaRelationProvider` is not necessary, but the legwork has already been done, so why not take advantage of it. Even still, this original PR is actually not that far from your duplicate, except it has a few lines in `JDBCRelation.scala`. So the code changes are close to the same size. All I need is for a final review as I have made the changes requested from the original review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65455/consoleFull)** for PR 12601 at commit [`4a02c82`](https://github.com/apache/spark/commit/4a02c8249ff224f4cbae7050e99f77d31601f4af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65275/consoleFull)** for PR 12601 at commit [`7ef7a48`](https://github.com/apache/spark/commit/7ef7a489b27fa6bd5d79ee4d428874162fd813de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @gatorsmile Again, because the work is already done and it allows a broader flexibility for the user. It is only ~20 lines of simple code. What is the reason to not have it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65087/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946981
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    import collection.JavaConverters._
    +    
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new SQLException(
    +          s"Table $table already exists, and SaveMode is set to ErrorIfExists.")
    +        case (SaveMode.Overwrite, true) =>
    +          if (jdbcOptions.isTruncate && JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
    +            JdbcUtils.truncateTable(conn, table)
    +            (false, true)
    +          } else {
    +            JdbcUtils.dropTable(conn, table)
    +            (true, true)
    +          }
    +        case (SaveMode.Append, true) => (false, true)
    +        case (_, true) => throw new IllegalArgumentException(s"Unexpected SaveMode, '$mode'," +
    +          " for handling existing tables.")
    +        case (_, false) => (true, true)
    +      }
    +
    +      if(doCreate) {
    +        val schema = JdbcUtils.schemaString(data, url)
    +        // To allow certain options to append when create a new table, which can be
    +        // table_options or partition_options.
    +        // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8"
    +        val createtblOptions = jdbcOptions.createTableOptions
    +        val sql = s"CREATE TABLE $table ($schema) $createtblOptions"
    +        val statement = conn.createStatement
    +        try {
    +          statement.executeUpdate(sql)
    +        } finally {
    +          statement.close()
    +        }
    +      }
    +      if(doSave) JdbcUtils.saveTable(data, url, table, props)
    --- End diff --
    
    The same here. we need one extra space after `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65460/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948436
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    --- End diff --
    
    Not used, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948243
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,75 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.BASICCREATETEST"))
    +    .save
    +
    +    assert(2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).count)
    +    assert(
    +      2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).collect()(0).length)
    +  }
    +
    +  test("save API with SaveMode.Overwrite") {
    +    import scala.collection.JavaConverters._
    +
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +    val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2)
    +
    +    df.write.format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    +      .options(properties.asScala)
    +      .save()
    +    df2.write.mode(SaveMode.Overwrite).format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    +      .options(properties.asScala)
    +      .save()
    +    assert(1 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).count())
    +    assert(2 === spark.read.jdbc(url1, "TEST.TRUNCATETEST", properties).collect()(0).length)
    +  }
    +
    +  test("save errors if url is not specified") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[RuntimeException] {
    +      df.write.format("jdbc")
    +        .option("dbtable", "TEST.TRUNCATETEST")
    +        .options(properties.asScala)
    +        .save()
    +    }.getMessage
    +    assert(e.contains("Option 'url' is required"))
    +  }
    +
    +  test("save errors if dbtable is not specified") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[RuntimeException] {
    +      df.write.format("jdbc")
    +        .option("url", url1)
    +        .options(properties.asScala)
    +        .save()
    +    }.getMessage
    +    assert(e.contains("Option 'dbtable' is required"))
    +  }
    +
    +  test("save errors if wrong user/password combination") {
    +    import scala.collection.JavaConverters._
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    var e = intercept[org.h2.jdbc.JdbcSQLException] {
    --- End diff --
    
    I must have copy/pasted this from somewhere...no clue why I would use a var otherwise. Shame on me :p


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80353010
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1096,13 +1096,17 @@ the Data Sources API. The following options are supported:
     
     {% highlight sql %}
     
    -CREATE TEMPORARY VIEW jdbcTable
    +CREATE TEMPORARY TABLE jdbcTable
    --- End diff --
    
    Please change it back. `CREATE TEMPORARY TABLE` is deprecated. You will get a Parser error
    ```
    CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY VIEW as an alternative.(line 1, pos 0)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77239532
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala ---
    @@ -114,7 +115,9 @@ private[sql] case class JDBCRelation(
     
       override val needConversion: Boolean = false
     
    -  override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
    +  override val schema: StructType = {
    +    providedSchemaOption.getOrElse(JDBCRDD.resolveTable(url, table, properties))
    +  }
    --- End diff --
    
    @HyukjinKwon I thought that since this was over the 100 limit it would be more readable/maintainable in the long run to have brackets. Again, I have no preference and if you feel strongly I will make the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @JustinPihony The document changes in Scala, JAVA and Python look good to me, but could you please also add the examples for both SQL and R? 
    
    - The R API example for `write.jdbc` can be found in the PR: https://github.com/apache/spark/pull/10480
    - The SQL example for the write path is like: `INSERT INTO TABLE schema.tablename SELECT * FROM schema1.resultset1`
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77947885
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,75 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.BASICCREATETEST"))
    +    .save
    +
    +    assert(2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).count)
    +    assert(
    +      2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).collect()(0).length)
    +  }
    +
    +  test("save API with SaveMode.Overwrite") {
    +    import scala.collection.JavaConverters._
    +
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +    val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2)
    +
    +    df.write.format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    --- End diff --
    
    How about `TRUNCATETEST` -> `SAVETEST`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Thanks for pinging. Will take a look soon!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    To speed it up, let me change my PR https://github.com/apache/spark/pull/14077 based on your feedback. You can judge which one is better. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r60752881
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -244,7 +244,11 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    dataSource.providingClass.newInstance() match {
    +      case jdbc: execution.datasources.jdbc.DefaultSource =>
    --- End diff --
    
    I agree and admit I was being lazy in not trying to figure out how to make the current implementation return a `BaseRelation`. I'll take another look today at just turning the DefaultSource into a `CreatableRelationProvider`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @rxin I am sure you are busy, so if there is anybody else I can ping to have this reviewed please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948661
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    +    val partitionInfo = if (partitionColumn == null) { 
    +      null 
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new TableAlreadyExistsException(
    --- End diff --
    
    You need to import it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Not sure you already knew it. Just want to share the commands how to build the doc. 
    ```Scala
    SKIP_API=1 jekyll build
    SKIP_API=1 jekyll serve
    ```
    
    After the second command, you can visit the generated document:
    ```
        Server address: http://127.0.0.1:4000/
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946123
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    --- End diff --
    
    ```Scala
    if (partitionColumn == null) {
      null
    } else {
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77945599
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,113 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    import collection.JavaConverters._
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    --- End diff --
    
    I also prefer to my way, which looks cleaner and easier to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @HyukjinKwon You are correct that it is a different issue, so I went ahead and removed the code, which actually showed that it was closer to ~5 lines of code... So can we please move past that and get this pushed through :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77122262
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala ---
    @@ -114,7 +115,9 @@ private[sql] case class JDBCRelation(
     
       override val needConversion: Boolean = false
     
    -  override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
    +  override val schema: StructType = {
    +    providedSchemaOption.getOrElse(JDBCRDD.resolveTable(url, table, properties))
    +  }
    --- End diff --
    
    It seems we might not need the extra brackets here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14525][SQL] Make DataFrameWrite.save wo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r60721164
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -244,7 +244,11 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           bucketSpec = getBucketSpec,
           options = extraOptions.toMap)
     
    -    dataSource.write(mode, df)
    +    dataSource.providingClass.newInstance() match {
    +      case jdbc: execution.datasources.jdbc.DefaultSource =>
    --- End diff --
    
    It looks a new method is introduced. I think we don't necessarily introduce this new function but use the existing interfaces `CreatableRelationProvider` in `interfaces`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r66716164
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala ---
    @@ -96,7 +97,16 @@ private[sql] case class JDBCRelation(
     
       override val needConversion: Boolean = false
     
    -  override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
    +  override val schema: StructType = {
    +    val resolvedSchema = JDBCRDD.resolveTable(url, table, properties)
    +    providedSchemaOption match {
    +      case Some(providedSchema) =>
    +        if (providedSchema.sql.toLowerCase == resolvedSchema.sql.toLowerCase) resolvedSchema
    --- End diff --
    
    I guess it would make sense if it does not try to apply the resolved schema when the schema is explicitly set like the other data sources.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948807
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    +    val partitionInfo = if (partitionColumn == null) { 
    +      null 
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new TableAlreadyExistsException(
    +          s"Table $table already exists, and SaveMode is set to ErrorIfExists.")
    --- End diff --
    
    Just pass the table name. That is enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948487
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
    --- End diff --
    
    Not used, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80353203
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +210,84 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.SAVETEST"))
    +    .save
    --- End diff --
    
    Nit: `save` -> `save()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Bumping my JIRA comment to here for @rxin to respond please?
    
    @rxin Given the bug found in SPARK-16401, the CreatableRelationProvider is not necessary. However it might be nice to have now that I've already implemented it. I can reduce the code by removing the CreatableRelationProvider aspect, so I would love your feedback on this PR. Even if just to say the code should be reduced or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    IMHO, it'd be another issue to support user-given schema. As the original issue is to support `write.format(..).save()`, it'd be different one. 
    
    Also, I think it is arguable to support to support user-given schema in JDBC. I remember I was told that it is not expensive to fetch the metadata to parse schema. So, it'd make sense if we always use the correct schema rather than adding the broader flexibility.
    
    On the other hand, we might need this if the schema from metadata is wrong in many cases (as suggested in [SPARK-17195](https://issues.apache.org/jira/browse/SPARK-17195)). 
    
    So, I think it'd better not extend `SchemaRelationProvider` here but make another JIRA or PR to discuss further.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65858/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77060138
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ---
    @@ -36,4 +36,9 @@ private[jdbc] class JDBCOptions(
       val upperBound = parameters.getOrElse("upperBound", null)
    --- End diff --
    
    I am fine with changing this to require as the short-circuit should occur at the very top instead of as the processing moves through. `sys.error` was how the old code was written, so I just had kept it as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77122510
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with SchemaRelationProvider with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
    +    createRelation(sqlContext, parameters, null)
    +  }
    +
    +  /** Returns a new base relation with the given parameters. */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      parameters: Map[String, String],
    +      schema: StructType): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) null
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
    -    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
    +    JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties,
    +      Option(schema))(sqlContext.sparkSession)
    +  }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite       | DropTable, CreateTable,   | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    require(parameters.isDefinedAt("url"), "Saving jdbc source requires 'url' to be set." +
    +        " (ie. df.option(\"url\", \"ACTUAL_URL\")")
    +    require(parameters.isDefinedAt("dbtable"), "Saving jdbc source requires 'dbtable' to be set." +
    +        " (ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")
    +    val url = parameters("url")
    +    val table = parameters("dbtable")
    --- End diff --
    
    Would this make sense if we use `JDBCOptions` here too?  (rather than adding `require` duplicately)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12601


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    OK, this isn't really my area. It looks reasonable to me. @gatorsmile ?
    I think https://github.com/apache/spark/pull/12601#issuecomment-246165894 was suggesting that some short notes should be added to the docs at http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases though I see the examples do in a way document the new behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77949636
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    -    } else {
    +    val partitionInfo = if (partitionColumn == null) { null }
    +    else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    +   *                 |    tableExists            |          !tableExists
    +   *------------------------------------------------------------------------------------
    +   * Ignore          | BaseRelation              | CreateTable, saveTable, BaseRelation
    +   * ErrorIfExists   | ERROR                     | CreateTable, saveTable, BaseRelation
    +   * Overwrite*      | (DropTable, CreateTable,) | CreateTable, saveTable, BaseRelation
    +   *                 | saveTable, BaseRelation   |
    +   * Append          | saveTable, BaseRelation   | CreateTable, saveTable, BaseRelation
    +   *
    +   * *Overwrite & tableExists with truncate, will not drop & create, but instead truncate
    +   */
    +  override def createRelation(
    +      sqlContext: SQLContext,
    +      mode: SaveMode,
    +      parameters: Map[String, String],
    +      data: DataFrame): BaseRelation = {
    +    import collection.JavaConverters._
    +    
    +    val jdbcOptions = new JDBCOptions(parameters)
    +    val url = jdbcOptions.url
    +    val table = jdbcOptions.table
    +
    +    val props = new Properties()
    +    props.putAll(parameters.asJava)
    +    val conn = JdbcUtils.createConnectionFactory(url, props)()
    +
    +    try {
    +      val tableExists = JdbcUtils.tableExists(conn, url, table)
    +
    +      val (doCreate, doSave) = (mode, tableExists) match {
    +        case (SaveMode.Ignore, true) => (false, false)
    +        case (SaveMode.ErrorIfExists, true) => throw new SQLException(
    --- End diff --
    
    Yeah, then, just issue an `AnalysisException`, like what we did in the other places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946201
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,105 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    --- End diff --
    
    Any reason why this is removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77122073
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -396,49 +396,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
        * @since 1.4.0
        */
       def jdbc(url: String, table: String, connectionProperties: Properties): Unit = {
    +    import scala.collection.JavaConverters._
    --- End diff --
    
    +1 for moving the import up. Just it looks ugly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77945639
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,16 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    --- End diff --
    
    Negative test cases are needed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Thanks for mentioning me. It looks good to me in my personal view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77948624
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -17,39 +17,106 @@
     
     package org.apache.spark.sql.execution.datasources.jdbc
     
    +import java.sql.SQLException
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +import org.apache.spark.sql.types.StructType
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    -      null
    +    val partitionInfo = if (partitionColumn == null) { 
    +      null 
    --- End diff --
    
    It sounds like you add an extra space. 
    
    Could you run the command to check the style? 
    `dev/lint-scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r67582037
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala ---
    @@ -96,7 +97,16 @@ private[sql] case class JDBCRelation(
     
       override val needConversion: Boolean = false
     
    -  override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
    +  override val schema: StructType = {
    +    val resolvedSchema = JDBCRDD.resolveTable(url, table, properties)
    +    providedSchemaOption match {
    +      case Some(providedSchema) =>
    +        if (providedSchema.sql.toLowerCase == resolvedSchema.sql.toLowerCase) resolvedSchema
    --- End diff --
    
    I can easily do a simpler getOrElse as is done in [spark-xml](https://github.com/databricks/spark-xml/blob/9f681939d16508abf4a12a129469ffebf87a2fa4/src/main/scala/com/databricks/spark/xml/XmlRelation.scala) which has more of a benefit of being lazier. But if an error does occur due to a mismatch, then the error is further from the original issue. I'm fine with either scenario, but at least wanted to give the other side for this one. Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r76512043
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ---
    @@ -36,4 +36,9 @@ private[jdbc] class JDBCOptions(
       val upperBound = parameters.getOrElse("upperBound", null)
       // the number of partitions
       val numPartitions = parameters.getOrElse("numPartitions", null)
    +
    +  if (partitionColumn != null
    +      && (lowerBound == null || upperBound == null || numPartitions == null)) {
    +      sys.error("Partitioning incompletely specified")
    --- End diff --
    
    We should be using `require` not `sys.error` everywhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @srowen [This documentation is pulled directly from the code examples](https://github.com/apache/spark/blame/master/docs/sql-programming-guide.md#L1080). That said, in double checking this I noticed that I forgot to make changes to all pertinent languages, which is now done except for the R and SQL. SQL doesn't seem to need to change, and R I am not sure if it is able to work in the default way. If I'm incorrect on R, I'd appreciate a nudge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r77946835
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
    @@ -208,4 +208,75 @@ class JDBCWriteSuite extends SharedSQLContext with BeforeAndAfter {
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
         assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).collect()(0).length)
       }
    +
    +  test("save works for format(\"jdbc\") if url and dbtable are set") {
    +    val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +
    +    df.write.format("jdbc")
    +    .options(Map("url" -> url, "dbtable" -> "TEST.BASICCREATETEST"))
    +    .save
    +
    +    assert(2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).count)
    +    assert(
    +      2 === sqlContext.read.jdbc(url, "TEST.BASICCREATETEST", new Properties).collect()(0).length)
    +  }
    +
    +  test("save API with SaveMode.Overwrite") {
    +    import scala.collection.JavaConverters._
    +
    +    val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
    +    val df2 = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2)
    +
    +    df.write.format("jdbc")
    +      .option("url", url1)
    +      .option("dbtable", "TEST.TRUNCATETEST")
    --- End diff --
    
    To what?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by JustinPihony <gi...@git.apache.org>.
Github user JustinPihony commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    @rxin Another ping for this to be reviewed please :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    Yeah! In the [document](http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases), we did not document the write path for JDBC. Now, it is a good chance to do it for both `df.write.jdbc(...)` and `df.write.format("jdbc").options(...).save`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/12601
  
    **[Test build #65087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65087/consoleFull)** for PR 12601 at commit [`c686b0e`](https://github.com/apache/spark/commit/c686b0e694406771e24cc3a0f93a3327eac3123e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12601#discussion_r80641407
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala ---
    @@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
     
     import java.util.Properties
     
    -import org.apache.spark.sql.SQLContext
    -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider}
    +import scala.collection.JavaConverters.mapAsJavaMapConverter
     
    -class JdbcRelationProvider extends RelationProvider with DataSourceRegister {
    +import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, SQLContext}
    +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider}
    +
    +class JdbcRelationProvider extends CreatableRelationProvider
    +  with RelationProvider with DataSourceRegister {
     
       override def shortName(): String = "jdbc"
     
    -  /** Returns a new base relation with the given parameters. */
       override def createRelation(
           sqlContext: SQLContext,
           parameters: Map[String, String]): BaseRelation = {
         val jdbcOptions = new JDBCOptions(parameters)
    -    if (jdbcOptions.partitionColumn != null
    -      && (jdbcOptions.lowerBound == null
    -        || jdbcOptions.upperBound == null
    -        || jdbcOptions.numPartitions == null)) {
    -      sys.error("Partitioning incompletely specified")
    -    }
    +    val partitionColumn = jdbcOptions.partitionColumn
    +    val lowerBound = jdbcOptions.lowerBound
    +    val upperBound = jdbcOptions.upperBound
    +    val numPartitions = jdbcOptions.numPartitions
     
    -    val partitionInfo = if (jdbcOptions.partitionColumn == null) {
    +    val partitionInfo = if (partitionColumn == null) {
           null
         } else {
           JDBCPartitioningInfo(
    -        jdbcOptions.partitionColumn,
    -        jdbcOptions.lowerBound.toLong,
    -        jdbcOptions.upperBound.toLong,
    -        jdbcOptions.numPartitions.toInt)
    +        partitionColumn, lowerBound.toLong, upperBound.toLong, numPartitions.toInt)
         }
         val parts = JDBCRelation.columnPartition(partitionInfo)
         val properties = new Properties() // Additional properties that we will pass to getConnection
         parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
         JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, properties)(sqlContext.sparkSession)
       }
    +
    +  /*
    +   * The following structure applies to this code:
    --- End diff --
    
    I also took a look at @gatorsmile 's approach, I think it's easier to understand, why it's rejected? We can also get rid of the `return`:
    ```
    if (tableExists) {
      mode match {
        case SaveMode.Ignore =>
        ......
      }
    } else {
      ......
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org