You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/01 14:27:13 UTC

[GitHub] [arrow] ianmcook commented on a change in pull request #9800: ARROW-12082: [R][Dataset] Allow create dataset from vector of file paths [WIP]

ianmcook commented on a change in pull request #9800:
URL: https://github.com/apache/arrow/pull/9800#discussion_r605701805



##########
File path: r/R/filesystem.R
##########
@@ -273,33 +273,56 @@ FileSystem$from_uri <- function(uri) {
   fs___FileSystemFromUri(uri)
 }
 
-get_path_and_filesystem <- function(x, filesystem = NULL) {
+get_paths_and_filesystem <- function(x, filesystem = NULL) {
   # Wrapper around FileSystem$from_uri that handles local paths
   # and an optional explicit filesystem
   if (inherits(x, "SubTreeFileSystem")) {
     return(list(fs = x$base_fs, path = x$base_path))
   }
-  assert_that(is.string(x))
-  if (is_url(x)) {
+  assert_that(is.character(x))
+  are_urls <- are_urls(x)
+  if (any(are_urls)) {
+    if (!all(are_urls)) {
+      stop(
+        "Vectors of paths and URIs for different file systems are not supported",
+        call. = FALSE
+      )
+    }
     if (!is.null(filesystem)) {
       # Stop? Can't have URL (which yields a fs) and another fs
     }
-    FileSystem$from_uri(x)
+    # TODO: do this more efficiently?
+    x <- lapply(x, FileSystem$from_uri)

Review comment:
       Based on some tests I ran, it looks like each call to `FileSystem$from_uri()` consumes about 10–20 KB of memory and takes about 500–1000 μs clock time. So there would not be much practical benefit in trying to do this more efficiently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org