You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/18 17:16:46 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #10326: ARROW-12791: [R] Better error handling for DatasetFactory$Finish() when no format specified

nealrichardson commented on a change in pull request #10326:
URL: https://github.com/apache/arrow/pull/10326#discussion_r634597442



##########
File path: r/R/dataset.R
##########
@@ -93,8 +93,19 @@ open_dataset <- function(sources,
     return(dataset___UnionDataset__create(sources, schema))
   }
   factory <- DatasetFactory$create(sources, partitioning = partitioning, ...)
-  # Default is _not_ to inspect/unify schemas
-  factory$Finish(schema, isTRUE(unify_schemas))
+  
+  tryCatch(
+    # Default is _not_ to inspect/unify schemas
+    factory$Finish(schema, isTRUE(unify_schemas)),
+    error = function (e) {
+      msg <- conditionMessage(e)
+      if(grep("Parquet magic bytes not found in footer", msg)){
+        stop("Looks like these are not parquet files, did you mean to specify a 'format'?", call. = FALSE)

Review comment:
       The error isn't wrong, it's just not informative at this higher layer. "Invalid parquet magic bytes" is meaningful if you yourself called read_parquet(file), but in this case, you weren't intending to read a parquet file because they aren't parquet files. 

##########
File path: r/R/dataset.R
##########
@@ -93,8 +93,19 @@ open_dataset <- function(sources,
     return(dataset___UnionDataset__create(sources, schema))
   }
   factory <- DatasetFactory$create(sources, partitioning = partitioning, ...)
-  # Default is _not_ to inspect/unify schemas
-  factory$Finish(schema, isTRUE(unify_schemas))
+  
+  tryCatch(
+    # Default is _not_ to inspect/unify schemas
+    factory$Finish(schema, isTRUE(unify_schemas)),
+    error = function (e) {
+      msg <- conditionMessage(e)
+      if(grep("Parquet magic bytes not found in footer", msg)){
+        stop("Looks like these are not parquet files, did you mean to specify a 'format'?", call. = FALSE)

Review comment:
       The error isn't wrong, it's just not informative at this particular higher layer. "Invalid parquet magic bytes" is meaningful if you yourself called read_parquet(file), but in this case, you weren't intending to read a parquet file because they aren't parquet files. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org