You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2016/06/18 21:12:27 UTC

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/13763

    [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

    ## What changes were proposed in this pull request?
    
    This issue adds `read.orc/write.orc` to SparkR for API parity.
    
    ## How was this patch tested?
    
    Pass the Jenkins tests (with new testcases).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-16051

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13763
    
----
commit 82b95772b14761279aae05e0ec9325b6114c3d6e
Author: Dongjoon Hyun <do...@apache.org>
Date:   2016-06-18T21:08:35Z

    [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67610931
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from a ORC file.
    --- End diff --
    
    an ORC


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60853/consoleFull)** for PR 13763 at commit [`197eaaa`](https://github.com/apache/spark/commit/197eaaad456cae6a0bd7b0ced914b5f0b0750741).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60848/consoleFull)** for PR 13763 at commit [`a035425`](https://github.com/apache/spark/commit/a035425b4cf344da69fdfd9633cb0192d262549a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60785/consoleFull)** for PR 13763 at commit [`82b9577`](https://github.com/apache/spark/commit/82b95772b14761279aae05e0ec9325b6114c3d6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67725053
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1667,6 +1668,25 @@ test_that("mutate(), transform(), rename() and names()", {
       detach(airquality)
     })
     
    +test_that("read/write ORC files", {
    +  df <- read.df(jsonPath, "json")
    +
    +  # Test write.df and read.df
    +  write.df(df, orcPath, "orc", mode = "overwrite")
    +  df2 <- read.df(orcPath, "orc")
    +  expect_is(df2, "SparkDataFrame")
    +  expect_equal(count(df), count(df2))
    +
    +  # Test write.orc and read.orc
    +  orcPath2 <- tempfile(pattern = "orcPath2", fileext = ".orc")
    +  write.orc(df, orcPath2)
    +  orcDF <- read.orc(orcPath2)
    +  expect_is(orcDF, "SparkDataFrame")
    +  expect_equal(count(orcDF), count(df))
    +
    +  unlink(orcPath2)
    +})
    --- End diff --
    
    Oh, right. Thank you again.
    If possible, I want to keep them consistently with `Parquet` case.
    For `Orc` file, we can add more testcases in the future.
    I will add `unlink(orcPath1)` at the bottom of this R file, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67722671
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,25 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from an ORC file.
    +#'
    +#' Loads an ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    +#' @return SparkDataFrame
    +#' @rdname read.orc
    +#' @export
    +#' @name read.orc
    +#' @note read.orc since 2.0.0
    +read.orc <- function(path) {
    +  sparkSession <- getSparkSession()
    +  # Allow the user to have a more flexible definiton of the text file path
    --- End diff --
    
    so instead.. "the ORC file path"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67659800
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from an ORC file.
    +#'
    +#' Loads an ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    +#' @return SparkDataFrame
    +#' @rdname read.orc
    +#' @export
    +#' @name read.orc
    +#' @method read.orc default
    +#' @note read.orc since 2.0.0
    +read.orc.default <- function(path) {
    +  sparkSession <- getSparkSession()
    +  # Allow the user to have a more flexible definiton of the text file path
    --- End diff --
    
    "ORC file paths"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60785/consoleFull)** for PR 13763 at commit [`82b9577`](https://github.com/apache/spark/commit/82b95772b14761279aae05e0ec9325b6114c3d6e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67610942
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from a ORC file.
    +#'
    +#' Loads a ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    +#' @return SparkDataFrame
    +#' @rdname read.orc
    +#' @export
    +#' @name read.orc
    +#' @method read.orc default
    +#' @note read.orc since 2.0.0
    +read.orc.default <- function(path) {
    --- End diff --
    
    Since read.orc is a new API method, this is not needed for backward compatibility?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by sun-rui <gi...@git.apache.org>.

Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67610926
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -701,6 +701,33 @@ setMethod("write.json",
                 invisible(callJMethod(write, "json", path))
               })
     
    +#' Save the contents of SparkDataFrame as a ORC file, preserving the schema.
    --- End diff --
    
    an ORC


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67722070
  
    --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
    @@ -1667,6 +1668,25 @@ test_that("mutate(), transform(), rename() and names()", {
       detach(airquality)
     })
     
    +test_that("read/write ORC files", {
    +  df <- read.df(jsonPath, "json")
    +
    +  # Test write.df and read.df
    +  write.df(df, orcPath, "orc", mode = "overwrite")
    +  df2 <- read.df(orcPath, "orc")
    +  expect_is(df2, "SparkDataFrame")
    +  expect_equal(count(df), count(df2))
    +
    +  # Test write.orc and read.orc
    +  orcPath2 <- tempfile(pattern = "orcPath2", fileext = ".orc")
    +  write.orc(df, orcPath2)
    +  orcDF <- read.orc(orcPath2)
    +  expect_is(orcDF, "SparkDataFrame")
    +  expect_equal(count(orcDF), count(df))
    +
    +  unlink(orcPath2)
    +})
    --- End diff --
    
    you should probably add a `unlink(orcPath)` too, in this Parquet case it's at the bottom of this R file. In the case of ORC perhaps we could move L71 orcPath into this test and unlink here as well, since it is not used elsewhere.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60785/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Thank you, @felixcheung !
    By the way, unfortunately, `DataFrameReader.scala` provides ORC and Parquet feature differently.
    For ORC, we can accept only one path now.
    ```
      @scala.annotation.varargs
      def parquet(paths: String*): DataFrame = {
        format("parquet").load(paths: _*)
      }
    
      def orc(path: String): DataFrame = format("orc").load(path)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by shivaram <gi...@git.apache.org>.

Github user shivaram commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    LGTM. Merging this to master and branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60848/consoleFull)** for PR 13763 at commit [`a035425`](https://github.com/apache/spark/commit/a035425b4cf344da69fdfd9633cb0192d262549a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60802/consoleFull)** for PR 13763 at commit [`5cacdb7`](https://github.com/apache/spark/commit/5cacdb7990c0dd09b4f788c9c14a1c50d2759cd5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60848/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60802/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67659762
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from an ORC file.
    +#'
    +#' Loads an ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    +#' @return SparkDataFrame
    +#' @rdname read.orc
    +#' @export
    +#' @name read.orc
    +#' @method read.orc default
    +#' @note read.orc since 2.0.0
    +read.orc.default <- function(path) {
    +  sparkSession <- getSparkSession()
    +  # Allow the user to have a more flexible definiton of the text file path
    +  path <- suppressWarnings(normalizePath(path))
    +  read <- callJMethod(sparkSession, "read")
    +  sdf <- callJMethod(read, "orc", path)
    +  dataFrame(sdf)
    +}
    +
    +read.orc <- function(x, ...) {
    --- End diff --
    
    ... and remove this function which is for back compat only


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Thank you for review, @sun-rui .
    I fixed all occurrence; `a ORC` with `an ORC`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67659702
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from a ORC file.
    +#'
    +#' Loads a ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    +#' @return SparkDataFrame
    +#' @rdname read.orc
    +#' @export
    +#' @name read.orc
    +#' @method read.orc default
    +#' @note read.orc since 2.0.0
    +read.orc.default <- function(path) {
    --- End diff --
    
    Correct, you can name this read.orc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    I see. Perhaps that's a good opportunity to make it the same in Scala/Python/R? 😄 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Hi, @shivaram , @felixcheung , @sun-rui .
    Could you review this PR when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Thank you so much, @felixcheung !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60853/consoleFull)** for PR 13763 at commit [`197eaaa`](https://github.com/apache/spark/commit/197eaaad456cae6a0bd7b0ced914b5f0b0750741).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60853/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13763


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67659976
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from an ORC file.
    +#'
    +#' Loads an ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    --- End diff --
    
    "Path of file to read. A vector of multiple paths is allowed."?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Actually, for the ORC, the reason I didn't try to get multiple file is the API consistently.
    Scala/Python also only supports single ORC, so R should does.
    I didn't dig futher, but I guessed there might be some limitation on ORC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13763
  
    **[Test build #60802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60802/consoleFull)** for PR 13763 at commit [`5cacdb7`](https://github.com/apache/spark/commit/5cacdb7990c0dd09b4f788c9c14a1c50d2759cd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13763#discussion_r67659892
  
    --- Diff: R/pkg/R/SQLContext.R ---
    @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
       }
     }
     
    +#' Create a SparkDataFrame from an ORC file.
    +#'
    +#' Loads an ORC file, returning the result as a SparkDataFrame.
    +#'
    +#' @param path Path of file to read.
    +#' @return SparkDataFrame
    +#' @rdname read.orc
    +#' @export
    +#' @name read.orc
    +#' @method read.orc default
    --- End diff --
    
    you wouldn't need this line @method either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org