You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by felixcheung <gi...@git.apache.org> on 2015/12/26 05:11:32 UTC

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

GitHub user felixcheung opened a pull request:

    https://github.com/apache/spark/pull/10480

    [SPARK-12224][SPARKR] R support for JDBC source

    Add R API for `read.jdbc`, `write.jdbc`.
    
    Tested this quite a bit manually with different combinations of parameters. It's not clear if we could have automated tests in R for this - Scala `JDBCSuite` depends on Java H2 in-memory database.
    
    Refactored some code into util so they could be tested.
    
    Core's R SerDe code needs to be updated to allow access to java.util.Properties as `jobj` handle which is required by DataFrameReader/Writer's `jdbc` method. It would be more code to add a `sql/r/SQLUtils` helper function.
    
    Tested:
    ```
    # with postgresql
    ../bin/sparkR --driver-class-path /usr/share/java/postgresql-9.4.1207.jre7.jar
    
    # read.jdbc
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345")
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", user = "user", password = 12345)
    
    # partitionColumn and numPartitions test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, numPartitions = 4, user = "user", password = 12345)
    a <- SparkR:::toRDD(df)
    SparkR:::getNumPartitions(a)
    [1] 4
    SparkR:::collectPartition(a, 2L)
    
    # defaultParallelism test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", partitionColumn = "did", lowerBound = 0, upperBound = 200, user = "user", password = 12345)
    SparkR:::getNumPartitions(a)
    [1] 2
    
    # predicates test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", predicates = list("did<=105"), user = "user", password = 12345)
    count(df) == 1
    
    # write.jdbc, default save mode "error"
    irisDf <- as.DataFrame(sqlContext, iris)
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "films2", user = "user", password = "12345")
    "error, already exists"
    
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "iris", user = "user", password = "12345")
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/felixcheung/spark rreadjdbc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10480
    
----
commit b0f28523f17d7733d4e1fe2b7e040db127b7188b
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-24T03:51:10Z

    read.jdbc support

commit c3e7bec0de7658c954539b421168e966bea17843
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-24T03:52:56Z

    update comment

commit 5a0f6d2a3be14931126ac420b31f94a276eb5c02
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-24T04:38:14Z

    write.jdbc, doc update

commit 5b15f38d151b5c90bc9e41ef02ad25712eec31f4
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-24T04:48:45Z

    more doc update

commit 98ec05e8b9ec79ea8ea424dfbfc22f15ab0f7429
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-24T16:02:42Z

    update doc

commit 12b36130fe1bd6c526c6e7ba89deb77b0cfb0ee3
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-24T16:12:44Z

    code fix

commit 3f90db6686daea182161de3b4c668b6901be89c9
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-26T03:50:51Z

    fix serialization of java.util.Properties, add tests for util functions, add generic, fix bugs

commit de635b18147e42bb931e8d81e51522330873498e
Author: felixcheung <fe...@hotmail.com>
Date:   2015-12-26T03:53:22Z

    update doc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173134843
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212160561
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56265/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173041946
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r50218806
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2401,4 +2401,41 @@ setMethod("str",
                     cat(paste0("\nDisplaying first ", ncol(localDF), " columns only."))
                   }
                 }
    -          })
    \ No newline at end of file
    +          })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    --- End diff --
    
    yes, it forces a new line in the generated doc, otherwise roxygen2 removes new line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r59658092
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    thanks @frreiss. I agree with the support for serde of Properties 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-167276610
  
    **[Test build #48335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48335/consoleFull)** for PR 10480 at commit [`de635b1`](https://github.com/apache/spark/commit/de635b18147e42bb931e8d81e51522330873498e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r50353509
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    I think 2 is acceptable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212160685
  
    Merging this to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170436168
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49087/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170424371
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48515382
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2272,3 +2260,40 @@ setMethod("with",
                 newEnv <- assignNewEnv(data)
                 eval(substitute(expr), envir = newEnv, enclos = newEnv)
               })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    +#'  append: Contents of this DataFrame are expected to be appended to existing data. \cr
    +#'  overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr
    +#'  error: An exception is expected to be thrown. \cr
    +#'  ignore: The save operation is expected to not save the contents of the DataFrame
    +#'     and to not change the existing data. \cr
    +#'
    +#' @param x A SparkSQL DataFrame
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName The name of the table in the external database
    +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
    +#' @family DataFrame functions
    +#' @rdname write.jdbc
    +#' @name write.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password")
    +#' }
    +setMethod("write.jdbc",
    +          signature(x = "DataFrame", url = "character", tableName = "character"),
    +          function(x, url, tableName, mode = "error", ...){
    +            jmode <- convertToJSaveMode(mode)
    +            jprops <- envToJProperties(varargsToEnv(...))
    --- End diff --
    
    I thought `env` is more generic


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48472173
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    +#' @return DataFrame
    +#' @rdname read.jdbc
    +#' @name read.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
    +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
    +#'                  upperBound = 10000, user = "username", password = "password")
    +#' }
    +
    +read.jdbc <- function(sqlContext, url, tableName,
    +                      partitionColumn = NULL, lowerBound = NULL, upperBound = NULL,
    +                      numPartitions = 0L, predicates = list(), ...) {
    +  jprops <- envToJProperties(varargsToEnv(...))
    +
    +  read <- callJMethod(sqlContext, "read")
    +  if (!is.null(partitionColumn)) {
    --- End diff --
    
    add mutual exclusive check for predicates? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48472149
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    --- End diff --
    
    State that parameter predicates is mutually exclusive from partitionColumn/lowerBound/upperBound/numPartitions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172279111
  
    @shivaram this is ready, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173073601
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170416396
  
    **[Test build #49082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49082/consoleFull)** for PR 10480 at commit [`991a9b7`](https://github.com/apache/spark/commit/991a9b7679c7afd1f9367ff1a8d6a9432a77ac51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-167283791
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48517532
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -537,6 +537,12 @@ setGeneric("write.df", function(df, path, ...) { standardGeneric("write.df") })
     #' @export
     setGeneric("saveDF", function(df, path, ...) { standardGeneric("saveDF") })
     
    --- End diff --
    
    yeah, correct


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48466540
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2272,3 +2260,40 @@ setMethod("with",
                 newEnv <- assignNewEnv(data)
                 eval(substitute(expr), envir = newEnv, enclos = newEnv)
               })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    +#'  append: Contents of this DataFrame are expected to be appended to existing data. \cr
    +#'  overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr
    +#'  error: An exception is expected to be thrown. \cr
    +#'  ignore: The save operation is expected to not save the contents of the DataFrame
    +#'     and to not change the existing data. \cr
    +#'
    +#' @param x A SparkSQL DataFrame
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName The name of the table in the external database
    +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
    +#' @family DataFrame functions
    +#' @rdname write.jdbc
    +#' @name write.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password")
    +#' }
    +setMethod("write.jdbc",
    +          signature(x = "DataFrame", url = "character", tableName = "character"),
    +          function(x, url, tableName, mode = "error", ...){
    +            jmode <- convertToJSaveMode(mode)
    +            jprops <- envToJProperties(varargsToEnv(...))
    +            write <- callJMethod(x@sdf, "write")
    +            callJMethod(write, "mode", jmode)
    --- End diff --
    
    write <- callJMethod(write, "mode", jmode)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172148329
  
    **[Test build #49514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49514/consoleFull)** for PR 10480 at commit [`8c64ac7`](https://github.com/apache/spark/commit/8c64ac7d07bdd1e39b5d0a6ab510a374edd8c6e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170418048
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49082/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-195425756
  
    Sorry for the delay @felixcheung -- I'll get back on this today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-167283619
  
    **[Test build #48335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48335/consoleFull)** for PR 10480 at commit [`de635b1`](https://github.com/apache/spark/commit/de635b18147e42bb931e8d81e51522330873498e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212103783
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r60273460
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    Thanks @felixcheung for summarizing the options. I was trying to judge how frequently we use `java.util.Properties` in the Spark DataFrame codebase and it looks like JDBC support is the only use case that is using this. That said if having support in SerDe makes the integration much easier I think we can go along this route. As @frreiss said, java.util.Properties is a pretty common data structure, so this could be useful in the future.
    
    Overall I think the current option is fine by me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-210758860
  
    @shivaram please check on this [question](https://github.com/apache/spark/pull/10480#discussion_r50348037) when you have a chance?
    Users are running into issues with jdbc source in R and we've discovered there is no simple workaround - I think it'd be great if we can get this in before the Spark 2.0 code freeze in a week or 2.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48517634
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    +#' @return DataFrame
    +#' @rdname read.jdbc
    +#' @name read.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
    +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
    +#'                  upperBound = 10000, user = "username", password = "password")
    +#' }
    +
    +read.jdbc <- function(sqlContext, url, tableName,
    +                      partitionColumn = NULL, lowerBound = NULL, upperBound = NULL,
    +                      numPartitions = 0L, predicates = list(), ...) {
    --- End diff --
    
    default parameter for predicates can be NULL?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170415749
  
    rebased and updated. thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172169926
  
    **[Test build #49523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49523/consoleFull)** for PR 10480 at commit [`7cb5121`](https://github.com/apache/spark/commit/7cb51217140de4386b45da0619551a38a68be25a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48467198
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2272,3 +2260,40 @@ setMethod("with",
                 newEnv <- assignNewEnv(data)
                 eval(substitute(expr), envir = newEnv, enclos = newEnv)
               })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    +#'  append: Contents of this DataFrame are expected to be appended to existing data. \cr
    +#'  overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr
    +#'  error: An exception is expected to be thrown. \cr
    +#'  ignore: The save operation is expected to not save the contents of the DataFrame
    +#'     and to not change the existing data. \cr
    +#'
    +#' @param x A SparkSQL DataFrame
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName The name of the table in the external database
    +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
    +#' @family DataFrame functions
    +#' @rdname write.jdbc
    +#' @name write.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password")
    +#' }
    +setMethod("write.jdbc",
    +          signature(x = "DataFrame", url = "character", tableName = "character"),
    +          function(x, url, tableName, mode = "error", ...){
    +            jmode <- convertToJSaveMode(mode)
    +            jprops <- envToJProperties(varargsToEnv(...))
    --- End diff --
    
    why not add a new util function that directly converting var args to properties?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212120117
  
    hmm
    ```
    [error] (docker-integration-tests/test:test) sbt.TestsFailedException: Tests unsuccessful
    [error] (streaming/test:test) sbt.TestsFailedException: Tests unsuccessful
    [error] Total time: 6116 s, completed Apr 19, 2016 1:03:29 PM
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48518467
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    I got it, java.util.Properties implements Map interface.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48516227
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    +#' @return DataFrame
    +#' @rdname read.jdbc
    +#' @name read.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
    +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
    +#'                  upperBound = 10000, user = "username", password = "password")
    +#' }
    +
    +read.jdbc <- function(sqlContext, url, tableName,
    +                      partitionColumn = NULL, lowerBound = NULL, upperBound = NULL,
    +                      numPartitions = 0L, predicates = list(), ...) {
    +  jprops <- envToJProperties(varargsToEnv(...))
    +
    +  read <- callJMethod(sqlContext, "read")
    +  if (!is.null(partitionColumn)) {
    --- End diff --
    
    `partitionColumn` takes preference.
    this is consistent with the Python API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212103489
  
    **[Test build #56244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56244/consoleFull)** for PR 10480 at commit [`26cd5f1`](https://github.com/apache/spark/commit/26cd5f15a572a6cc6d20eb7fb3cb63d08bfb3bff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212026558
  
    @felixcheung Could you bring this PR up to date ? I think the code changes look fine to me and we can merge after this goes through Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r49286045
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    +#' @return DataFrame
    +#' @rdname read.jdbc
    +#' @name read.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
    +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
    +#'                  upperBound = 10000, user = "username", password = "password")
    +#' }
    +
    +read.jdbc <- function(sqlContext, url, tableName,
    +                      partitionColumn = NULL, lowerBound = NULL, upperBound = NULL,
    +                      numPartitions = 0L, predicates = list(), ...) {
    --- End diff --
    
    we want to say it should be a list of predicates, so I think it's better the default is an empty list


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212127912
  
    Lets give it one more shot I guess.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r50348037
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    @shivaram as you see we are calling 3 different overloads of `read().jdbc()` in Scala, 4 if counting `write().jdbc()`. I think there would be 4 approaches to handle `read().jdbc()`:
      
    1. Have 3 JVM helper functions
    2. Have 1 helper function and on JVM side figure out which overload to route to
    3. Have 1 helper function and include parameter processing (eg. check numPartitions/defaultParallelism etc), and overload checks all within JVM - and leave R to be a thin shim
    4. serialize Properties as jobj and work on it on R side
    
    I feel #4 gives us the least overhead (less code) and more flexibility (since logic like default values for numPartition exists only on R/Python and not on Scala side).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r49286033
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2272,3 +2260,40 @@ setMethod("with",
                 newEnv <- assignNewEnv(data)
                 eval(substitute(expr), envir = newEnv, enclos = newEnv)
               })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    +#'  append: Contents of this DataFrame are expected to be appended to existing data. \cr
    +#'  overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr
    +#'  error: An exception is expected to be thrown. \cr
    +#'  ignore: The save operation is expected to not save the contents of the DataFrame
    +#'     and to not change the existing data. \cr
    +#'
    +#' @param x A SparkSQL DataFrame
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName The name of the table in the external database
    +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
    +#' @family DataFrame functions
    +#' @rdname write.jdbc
    +#' @name write.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password")
    +#' }
    +setMethod("write.jdbc",
    +          signature(x = "DataFrame", url = "character", tableName = "character"),
    +          function(x, url, tableName, mode = "error", ...){
    +            jmode <- convertToJSaveMode(mode)
    +            jprops <- envToJProperties(varargsToEnv(...))
    +            write <- callJMethod(x@sdf, "write")
    +            callJMethod(write, "mode", jmode)
    --- End diff --
    
    fixed, thx.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-195205927
  
    @shivaram please check on this [question](https://github.com/apache/spark/pull/10480#discussion_r50348037) when you have a chance?
    Users are running into issues with jdbc source in R and we've discovered there is no simple workaround.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-167276041
  
    @shivaram @sun-rui 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48466526
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -537,6 +537,12 @@ setGeneric("write.df", function(df, path, ...) { standardGeneric("write.df") })
     #' @export
     setGeneric("saveDF", function(df, path, ...) { standardGeneric("saveDF") })
     
    --- End diff --
    
    missing read.jdbc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48517433
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    --- End diff --
    
    OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172161857
  
    **[Test build #49523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49523/consoleFull)** for PR 10480 at commit [`7cb5121`](https://github.com/apache/spark/commit/7cb51217140de4386b45da0619551a38a68be25a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173052052
  
    **[Test build #49739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49739/consoleFull)** for PR 10480 at commit [`fccc761`](https://github.com/apache/spark/commit/fccc761ea42fefa2ad6efe31d90f325957a08e27).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212160352
  
    **[Test build #56265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56265/consoleFull)** for PR 10480 at commit [`26cd5f1`](https://github.com/apache/spark/commit/26cd5f15a572a6cc6d20eb7fb3cb63d08bfb3bff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170436166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by frreiss <gi...@git.apache.org>.

Github user frreiss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r59598252
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    Personally I don't think that special-casing the Properties object here is a major problem -- java.util.Properties is a very commonly used class, and it would make sense for the RPC layer of SparkR to handle Properties alongside other common types like Map and String. But it makes sense to defer to Shivaram on this point. I would vote for option (2) above.
    
    Note that, as far as I can see, the code here to pass a Properties object back to R is only triggered by the test cases in this PR. The actual code for invoking `read.jdbc()` only _writes_ to Properties objects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48515246
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    +#' @return DataFrame
    +#' @rdname read.jdbc
    +#' @name read.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
    +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
    +#'                  upperBound = 10000, user = "username", password = "password")
    +#' }
    +
    +read.jdbc <- function(sqlContext, url, tableName,
    +                      partitionColumn = NULL, lowerBound = NULL, upperBound = NULL,
    +                      numPartitions = 0L, predicates = list(), ...) {
    +  jprops <- envToJProperties(varargsToEnv(...))
    +
    +  read <- callJMethod(sqlContext, "read")
    +  if (!is.null(partitionColumn)) {
    +    if (is.null(numPartitions) || numPartitions == 0) {
    --- End diff --
    
    this is the matching behavior in Python API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48517423
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2272,3 +2260,40 @@ setMethod("with",
                 newEnv <- assignNewEnv(data)
                 eval(substitute(expr), envir = newEnv, enclos = newEnv)
               })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    +#'  append: Contents of this DataFrame are expected to be appended to existing data. \cr
    +#'  overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr
    +#'  error: An exception is expected to be thrown. \cr
    +#'  ignore: The save operation is expected to not save the contents of the DataFrame
    +#'     and to not change the existing data. \cr
    +#'
    +#' @param x A SparkSQL DataFrame
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName The name of the table in the external database
    +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
    +#' @family DataFrame functions
    +#' @rdname write.jdbc
    +#' @name write.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password")
    +#' }
    +setMethod("write.jdbc",
    +          signature(x = "DataFrame", url = "character", tableName = "character"),
    +          function(x, url, tableName, mode = "error", ...){
    +            jmode <- convertToJSaveMode(mode)
    +            jprops <- envToJProperties(varargsToEnv(...))
    --- End diff --
    
    vararg -> env -> properties seems a little redundant. I would prefer vararg -> properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r50198797
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    Yeah it still feels awkward to just do this specially for the Properties object. 
    
    @felixcheung Do you have an idea what part of the code would move to scala if we want to do it on the scala side ? Typically we do deal with such conversions on the scala side, so thats the main reason I'm asking. Is it just the `varargsToJProperties` function ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170418270
  
    ```
    spark-mllib: found 0 potential binary incompatibilities (filtered 10)
    sbt.ResolveException: unresolved dependency: org.eclipse.paho#org.eclipse.paho.client.mqttv3;1.0.1: not found
    [error] (streaming-mqtt/*:mimaPreviousClassfiles) sbt.ResolveException: unresolved dependency: org.eclipse.paho#org.eclipse.paho.client.mqttv3;1.0.1: not found
    ```
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r49286036
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2272,3 +2260,40 @@ setMethod("with",
                 newEnv <- assignNewEnv(data)
                 eval(substitute(expr), envir = newEnv, enclos = newEnv)
               })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    +#'  append: Contents of this DataFrame are expected to be appended to existing data. \cr
    +#'  overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr
    +#'  error: An exception is expected to be thrown. \cr
    +#'  ignore: The save operation is expected to not save the contents of the DataFrame
    +#'     and to not change the existing data. \cr
    +#'
    +#' @param x A SparkSQL DataFrame
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName The name of the table in the external database
    +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default)
    +#' @family DataFrame functions
    +#' @rdname write.jdbc
    +#' @name write.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password")
    +#' }
    +setMethod("write.jdbc",
    +          signature(x = "DataFrame", url = "character", tableName = "character"),
    +          function(x, url, tableName, mode = "error", ...){
    +            jmode <- convertToJSaveMode(mode)
    +            jprops <- envToJProperties(varargsToEnv(...))
    --- End diff --
    
    changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212048976
  
    **[Test build #56244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56244/consoleFull)** for PR 10480 at commit [`26cd5f1`](https://github.com/apache/spark/commit/26cd5f15a572a6cc6d20eb7fb3cb63d08bfb3bff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48515677
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    As explained above, this is needed to properly handle java.util.Properties. This is useful for 2 parts:
    1. R code could set or get values from java.util.Properties directly
    2. For callJMethod to match parameter types properly
    
    As above, we could have a Scala helper that takes in all the parameters - in such case it would be better to have all the logic in Scala and perhaps it would be easier to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172169964
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212103788
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56244/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173073602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49739/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170418047
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48515179
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    --- End diff --
    
    It's in line 564 above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173041949
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49737/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173073487
  
    **[Test build #49739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49739/consoleFull)** for PR 10480 at commit [`fccc761`](https://github.com/apache/spark/commit/fccc761ea42fefa2ad6efe31d90f325957a08e27).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172148495
  
    **[Test build #49514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49514/consoleFull)** for PR 10480 at commit [`8c64ac7`](https://github.com/apache/spark/commit/8c64ac7d07bdd1e39b5d0a6ab510a374edd8c6e0).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-176510321
  
    @shivaram any suggestion on how to proceed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212160558
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212128129
  
    **[Test build #56265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56265/consoleFull)** for PR 10480 at commit [`26cd5f1`](https://github.com/apache/spark/commit/26cd5f15a572a6cc6d20eb7fb3cb63d08bfb3bff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170436106
  
    **[Test build #49087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49087/consoleFull)** for PR 10480 at commit [`991a9b7`](https://github.com/apache/spark/commit/991a9b7679c7afd1f9367ff1a8d6a9432a77ac51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-167517583
  
    For test JDBC, we  can add a helper function in Scala side, which reuses code in JDBCSuite to start a in-memory JDBC server?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r50216049
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    It is possible to construct a map for the properties and pass it to a helper function on JVM side. That helper function can convert the map into java.util.Properties. The problem is that if there are multiple Scala API functions taking java.util.Properties as parameters, we have to create as many helper functions as them. So I think we can keep current implementation as is for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48515129
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -537,6 +537,12 @@ setGeneric("write.df", function(df, path, ...) { standardGeneric("write.df") })
     #' @export
     setGeneric("saveDF", function(df, path, ...) { standardGeneric("saveDF") })
     
    --- End diff --
    
    not a S4 method, so we are not setting a generic


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170202498
  
    @sun-rui Are there any more comments on this PR  ?
    @felixcheung Could you bring this up to date with `master`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172148499
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49514/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-172148498
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r50215657
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2401,4 +2401,41 @@ setMethod("str",
                     cat(paste0("\nDisplaying first ", ncol(localDF), " columns only."))
                   }
                 }
    -          })
    \ No newline at end of file
    +          })
    +
    +#' Saves the content of the DataFrame to an external database table via JDBC
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Also, mode is used to specify the behavior of the save operation when
    +#' data already exists in the data source. There are four modes: \cr
    --- End diff --
    
    is "\cr" intended for Roxygen format?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-212127807
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48518375
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    my preference is to do more in R. if you feel strongly about having a helper in Scala instead of handling Properties then we could move most of the code into a Scala helper.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48472220
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL
       sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options)
       dataFrame(sdf)
     }
    +
    +#' Create a DataFrame representing the database table accessible via JDBC URL
    +#'
    +#' Additional JDBC database connection properties can be set (...)
    +#'
    +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be
    +#' retrieved in parallel based on the `numPartitions` or by the predicates.
    +#'
    +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
    +#' your external database systems.
    +#'
    +#' @param sqlContext SQLContext to use
    +#' @param url JDBC database url of the form `jdbc:subprotocol:subname`
    +#' @param tableName the name of the table in the external database
    +#' @param partitionColumn the name of a column of integral type that will be used for partitioning
    +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride
    +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride
    +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive),
    +#'                      `upperBound` (exclusive), form partition strides for generated WHERE
    +#'                      clause expressions used to split the column `partitionColumn` evenly.
    +#'                      This defaults to SparkContext.defaultParallelism when unset.
    +#' @param predicates a list of conditions in the where clause; each one defines one partition
    +#' @return DataFrame
    +#' @rdname read.jdbc
    +#' @name read.jdbc
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' sc <- sparkR.init()
    +#' sqlContext <- sparkRSQL.init(sc)
    +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
    +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
    +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
    +#'                  upperBound = 10000, user = "username", password = "password")
    +#' }
    +
    +read.jdbc <- function(sqlContext, url, tableName,
    +                      partitionColumn = NULL, lowerBound = NULL, upperBound = NULL,
    +                      numPartitions = 0L, predicates = list(), ...) {
    +  jprops <- envToJProperties(varargsToEnv(...))
    +
    +  read <- callJMethod(sqlContext, "read")
    +  if (!is.null(partitionColumn)) {
    +    if (is.null(numPartitions) || numPartitions == 0) {
    --- End diff --
    
    may be we can simply pass numPartitions. Instead of passing a default value when it is null or 0. This behavior is not required by Scala API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10480#discussion_r48466513
  
    --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
    @@ -355,6 +355,13 @@ private[spark] object SerDe {
               writeInt(dos, v.length)
               v.foreach(elem => writeObject(dos, elem))
     
    +        // Handle Properties
    --- End diff --
    
    This is not needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170418031
  
    **[Test build #49082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49082/consoleFull)** for PR 10480 at commit [`991a9b7`](https://github.com/apache/spark/commit/991a9b7679c7afd1f9367ff1a8d6a9432a77ac51).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-167283792
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48335/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-173043756
  
    jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10480#issuecomment-170425634
  
    **[Test build #49087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49087/consoleFull)** for PR 10480 at commit [`991a9b7`](https://github.com/apache/spark/commit/991a9b7679c7afd1f9367ff1a8d6a9432a77ac51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org