You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/27 15:15:55 UTC

[GitHub] [arrow] jorisvandenbossche opened a new pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

jorisvandenbossche opened a new pull request #8067:
URL: https://github.com/apache/arrow/pull/8067


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#discussion_r478525557



##########
File path: python/pyarrow/_fs.pyx
##########
@@ -419,17 +420,27 @@ cdef class FileSystem(_Weakrefable):
             vector[c_string] paths
             CFileSelector selector
 
+        single_path = False
         if isinstance(paths_or_selector, FileSelector):
             with nogil:
                 selector = (<FileSelector>paths_or_selector).selector
                 infos = GetResultValue(self.fs.GetFileInfo(selector))
-        elif isinstance(paths_or_selector, (list, tuple)):
-            paths = [_path_as_bytes(s) for s in paths_or_selector]
+        else:
+            if isinstance(paths_or_selector, (list, tuple)):
+                paths = [_path_as_bytes(s) for s in paths_or_selector]
+            else:
+                try:
+                    paths = [_path_as_bytes(paths_or_selector)]
+                except TypeError:
+                    raise TypeError(
+                        "Must pass either path(s) or a FileSelector"
+                    )
+                single_path = True
             with nogil:
                 infos = GetResultValue(self.fs.GetFileInfo(paths))

Review comment:
       Yup :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#issuecomment-682034382


   https://issues.apache.org/jira/browse/ARROW-9875


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#discussion_r478871227



##########
File path: python/pyarrow/_fs.pyx
##########
@@ -419,17 +420,27 @@ cdef class FileSystem(_Weakrefable):
             vector[c_string] paths
             CFileSelector selector
 
+        single_path = False
         if isinstance(paths_or_selector, FileSelector):
             with nogil:
                 selector = (<FileSelector>paths_or_selector).selector
                 infos = GetResultValue(self.fs.GetFileInfo(selector))
-        elif isinstance(paths_or_selector, (list, tuple)):
-            paths = [_path_as_bytes(s) for s in paths_or_selector]
+        else:
+            if isinstance(paths_or_selector, (list, tuple)):
+                paths = [_path_as_bytes(s) for s in paths_or_selector]
+            else:
+                try:
+                    paths = [_path_as_bytes(paths_or_selector)]
+                except TypeError:
+                    raise TypeError(
+                        "Must pass either path(s) or a FileSelector"
+                    )
+                single_path = True
             with nogil:
                 infos = GetResultValue(self.fs.GetFileInfo(paths))

Review comment:
       OK, much better now ;)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#issuecomment-682018886


   @pitrou the docstring of `get_file_info` says it accepts a list of "path-likes". I assumed this would mean it stringifies things like pathlib.Path, but that is apparently not the case. 
   Do you remember what you meant with "path-like", and I suppose not supporting pathlib (at this level of the interface) was intentional? (other methods of the FileSystem also only accept actual strings)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8067:
URL: https://github.com/apache/arrow/pull/8067


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#discussion_r481089460



##########
File path: python/pyarrow/_fs.pyx
##########
@@ -405,16 +405,19 @@ cdef class FileSystem(_Weakrefable):
 
         Parameters
         ----------
-        paths_or_selector: FileSelector or list of path-likes
-            Either a selector object or a list of path-like objects.
-            The selector's base directory will not be part of the results, even
-            if it exists. If it doesn't exist, use `allow_not_found`.
+        paths_or_selector: FileSelector, path-like or list of path-likes
+            Either a selector object, a path-like object or a list of
+            path-like objects. The selector's base directory will not be
+            part of the results, even if it exists. If it doesn't exist,
+            use `allow_not_found`.
 
         Returns
         -------
         file_infos : list of FileInfo

Review comment:
       Good point. I would personally keep it the way I did it in the PR (single path -> single FileInfo, list of paths / selector -> list of FileInfo objects), but in that case need to update this in the docstring.
   
   Alternatively, can also always return a list, but IMO that defeats partly the convenience of being able to pass a single path, as you would still need to unpack the result (`fs.get_file_info(path)[0]`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#issuecomment-684529945


   @pitrou this should be good now, I think


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#issuecomment-682365331


   > The reason it doesn't accept pathlib.Path is that pathlib represents local paths.
   
   Ah, yes, I always forget that part ..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#discussion_r478501543



##########
File path: python/pyarrow/_fs.pyx
##########
@@ -419,17 +420,27 @@ cdef class FileSystem(_Weakrefable):
             vector[c_string] paths
             CFileSelector selector
 
+        single_path = False
         if isinstance(paths_or_selector, FileSelector):
             with nogil:
                 selector = (<FileSelector>paths_or_selector).selector
                 infos = GetResultValue(self.fs.GetFileInfo(selector))
-        elif isinstance(paths_or_selector, (list, tuple)):
-            paths = [_path_as_bytes(s) for s in paths_or_selector]
+        else:
+            if isinstance(paths_or_selector, (list, tuple)):
+                paths = [_path_as_bytes(s) for s in paths_or_selector]
+            else:
+                try:
+                    paths = [_path_as_bytes(paths_or_selector)]
+                except TypeError:
+                    raise TypeError(
+                        "Must pass either path(s) or a FileSelector"
+                    )
+                single_path = True
             with nogil:
                 infos = GetResultValue(self.fs.GetFileInfo(paths))

Review comment:
       I only noticed afterwards that `GetFileInfo` can also accept a single string instead of a vector (and which is exposed in libarrow_fs.pxd). That might be a bit cleaner than the above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#issuecomment-682023137


   It's likely the docstring was written by @kszucs actually. The reason it doesn't accept `pathlib.Path` is that `pathlib` represents local paths.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#discussion_r481086303



##########
File path: python/pyarrow/_fs.pyx
##########
@@ -405,16 +405,19 @@ cdef class FileSystem(_Weakrefable):
 
         Parameters
         ----------
-        paths_or_selector: FileSelector or list of path-likes
-            Either a selector object or a list of path-like objects.
-            The selector's base directory will not be part of the results, even
-            if it exists. If it doesn't exist, use `allow_not_found`.
+        paths_or_selector: FileSelector, path-like or list of path-likes
+            Either a selector object, a path-like object or a list of
+            path-like objects. The selector's base directory will not be
+            part of the results, even if it exists. If it doesn't exist,
+            use `allow_not_found`.
 
         Returns
         -------
         file_infos : list of FileInfo

Review comment:
       Nit: should update the result type here? I'm not sure what the convention is when the result type can vary...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8067: ARROW-9875: [Python] Let FileSystem.get_file_info accept a single path

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8067:
URL: https://github.com/apache/arrow/pull/8067#discussion_r481090653



##########
File path: python/pyarrow/_fs.pyx
##########
@@ -405,16 +405,19 @@ cdef class FileSystem(_Weakrefable):
 
         Parameters
         ----------
-        paths_or_selector: FileSelector or list of path-likes
-            Either a selector object or a list of path-like objects.
-            The selector's base directory will not be part of the results, even
-            if it exists. If it doesn't exist, use `allow_not_found`.
+        paths_or_selector: FileSelector, path-like or list of path-likes
+            Either a selector object, a path-like object or a list of
+            path-like objects. The selector's base directory will not be
+            part of the results, even if it exists. If it doesn't exist,
+            use `allow_not_found`.
 
         Returns
         -------
         file_infos : list of FileInfo

Review comment:
       I agree with changing the docstring.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org