You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/06/13 16:10:22 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #35055: GH-34436: [R] Bindings for JSON Dataset

westonpace commented on code in PR #35055:
URL: https://github.com/apache/arrow/pull/35055#discussion_r1228379795


##########
r/R/dataset-format.R:
##########
@@ -113,6 +115,45 @@ ParquetFileFormat$create <- function(...,
 #' @export
 IpcFileFormat <- R6Class("IpcFileFormat", inherit = FileFormat)
 
+#' JSON dataset file format
+#'
+#' @description
+#' A `JsonFileFormat` is a [FileFormat] subclass which holds information about how to
+#' read and parse the files included in a JSON `Dataset`.
+#'
+#' @section Factory:
+#' `JsonFileFormat$create()` can take options in the form of lists passed through as `parse_options`,
+#'  or `read_options` parameters.
+#'
+#'  Available `read_options` parameters:
+#'  * `use_threads`: Whether to use the global CPU thread pool. Default `TRUE`. If `FALSE`, JSON input must end with an
+#'  empty line.

Review Comment:
   It seems odd these two things are related.  Am I missing something?



##########
r/R/dataset-format.R:
##########
@@ -113,6 +115,45 @@ ParquetFileFormat$create <- function(...,
 #' @export
 IpcFileFormat <- R6Class("IpcFileFormat", inherit = FileFormat)
 
+#' JSON dataset file format
+#'
+#' @description
+#' A `JsonFileFormat` is a [FileFormat] subclass which holds information about how to
+#' read and parse the files included in a JSON `Dataset`.
+#'
+#' @section Factory:
+#' `JsonFileFormat$create()` can take options in the form of lists passed through as `parse_options`,
+#'  or `read_options` parameters.
+#'
+#'  Available `read_options` parameters:
+#'  * `use_threads`: Whether to use the global CPU thread pool. Default `TRUE`. If `FALSE`, JSON input must end with an
+#'  empty line.
+#'  * `block_size`: Block size we request from the IO layer; also determines size of chunks when `use_threads`
+#'   is `TRUE`.

Review Comment:
   This leads me to wonder what determines the size of chunks when `use_threads` is false?  Although maybe there just aren't chunks?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org