You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/16 17:50:00 UTC

[GitHub] [arrow] thisisnic commented on a change in pull request #12083: ARROW-14744: [R] open_dataset() error when `schema` argument supplied, but `column_names` not supplied to `CSVReadOptions`

thisisnic commented on a change in pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#discussion_r785474474



##########
File path: r/R/dataset.R
##########
@@ -123,6 +123,7 @@
 #' or call [`$NewScan()`][Scanner] to construct a query directly.
 #' @export
 #' @seealso `vignette("dataset", package = "arrow")`
+#' See [read_csv_arrow()] on how to specify column names and types for "csv"/"text" and "tsv" -formats.

Review comment:
       This is great, though I think it might fit better after the content on line 121 along with links to `read_feather()` and `read_parquet()` and a more generic comment about viewing those page for format-specific options.

##########
File path: r/R/dataset-format.R
##########
@@ -122,6 +122,18 @@ CsvFileFormat$create <- function(...,
                                  opts = csv_file_format_parse_options(...),
                                  convert_options = csv_file_format_convert_opts(...),
                                  read_options = csv_file_format_read_opts(...)) {
+
+  options <- list(...)
+  schema  <- options[["schema"]]
+
+  if (length(read_options$column_names) > 0 & !is.null(schema) & !identical(names(schema), read_options$column_names)) {
+    abort(c(
+        '"column_names" in read_options do not match the schema.',
+      i = "Set column_names in read_options to match the schema",
+      i = "Omit the read_options argument"
+    ))

Review comment:
       Looking good, just a couple of suggestions:
   - the user of the `open_dataset()` function won't necessarily have used the function `read_options()` directly, so we should remove the reference to it to avoid confusion
   - to provide maximal useful information to the end-user, could you print out the arguments which are mismatches between `column_names` and the schema?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org