You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/04 02:18:44 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #10141: ARROW-11787: [R] Implement write csv

nealrichardson commented on a change in pull request #10141:
URL: https://github.com/apache/arrow/pull/10141#discussion_r625288659



##########
File path: r/R/csv.R
##########
@@ -585,3 +602,52 @@ readr_to_csv_convert_options <- function(na,
     include_columns = include_columns
   )
 }
+
+#' Write CSV file to disk
+#'
+#' @param x `data.frame`, [RecordBatch], or [Table]
+#' @param sink A string file path, URI, or [OutputStream], or path in a file
+#' system (`SubTreeFileSystem`)
+#' @param include_header Whether to write an initial header line with column names
+#' @param batch_size Maximum number of rows processed at a time. Default is 1024.
+#'
+#' @return The input `x`, invisibly. Note that if `sink` is an [OutputStream],
+#' the stream will be left open.
+#' @export
+#' @examples
+#' \donttest{
+#' tf <- tempfile()
+#' on.exit(unlink(tf))
+#' write_csv_arrow(mtcars, tf)
+#' }
+#' @include arrow-package.R
+write_csv_arrow <- function(x,
+                            sink,
+                            include_header = TRUE,
+                            batch_size = 1024L) {
+  # Handle and validate options before touching data
+  batch_size <- as.integer(batch_size)
+  assert_that(batch_size > 0)

Review comment:
       Let's move this to CsvWriteOptions$create(). I like to keep the input validation as close to the C++ call site as possible, just so that if someone uses the lower-level bindings, they still have protection from segfaults.
   ```suggestion
   ```

##########
File path: r/R/csv.R
##########
@@ -408,6 +413,18 @@ CsvReadOptions$create <- function(use_threads = option_use_threads(),
   )
 }
 
+#' @rdname CsvReadOptions
+#' @export
+CsvWriteOptions <- R6Class("CsvWriteOptions", inherit = ArrowObject)
+CsvWriteOptions$create <- function(include_header = TRUE, batch_size = 1024L){
+  csv___WriteOptions__initialize(
+    list(
+      include_header = include_header,
+      batch_size = batch_size
+    )
+  )

Review comment:
       `rlang::is_integerish` can handle a few other validation cases we want (length 1, not missing):
   
   ```
   > rlang::is_integerish(1:4)
   [1] TRUE
   > rlang::is_integerish(1:4, n = 1)
   [1] FALSE
   > rlang::is_integerish(NA_integer_)
   [1] TRUE
   > rlang::is_integerish(NA_integer_, finite = TRUE)
   [1] FALSE
   ```
   
   ```suggestion
     assert_that(is_integerish(batch_size, n = 1, finite = TRUE), batch_size > 0)
     csv___WriteOptions__initialize(
       list(
         include_header = include_header,
         batch_size = as.integer(batch_size)
       )
     )
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org