You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Carlos O'Ryan (Jira)" <ji...@apache.org> on 2021/12/15 18:05:00 UTC

[jira] [Updated] (ARROW-15121) [C++] Implement max recursion for GcsFileSystem

     [ https://issues.apache.org/jira/browse/ARROW-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carlos O'Ryan updated ARROW-15121:
----------------------------------
    Description: 
The current implementation ignores the {{max_recursion}} attribute in the selector.  Seems like a useful thing to do.

In GCS it is *more* expensive to do {{ls foo/}} and then recurse over the results than to do a {{ls -R foo/}}.  The running time of a (recursive or non-recursive) operation is proportional to the number of objects in the prefix, not to the number of objects returned.

Therefore, the implementation will probably list all the objects and directories, and simply filter out those that are "too deep" in the recursion hierarchy.

  was:
The current implementation ignores the {{max_recursion}} attribute in the selector.  Seems like a useful thing to do.

In GCS it is *more* expensive to do {{ls foo/*}} and then recurse over the results than to do a {{ls foo/**}}.  The running time of a (recursive or non-recursive) operation is proportional to the number of objects in the prefix, not to the number of objects returned.

Therefore, the implementation will probably list all the objects and directories, and simply filter out those that are "too deep" in the recursion hierarchy.


> [C++] Implement max recursion for GcsFileSystem
> -----------------------------------------------
>
>                 Key: ARROW-15121
>                 URL: https://issues.apache.org/jira/browse/ARROW-15121
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Carlos O'Ryan
>            Priority: Major
>
> The current implementation ignores the {{max_recursion}} attribute in the selector.  Seems like a useful thing to do.
> In GCS it is *more* expensive to do {{ls foo/}} and then recurse over the results than to do a {{ls -R foo/}}.  The running time of a (recursive or non-recursive) operation is proportional to the number of objects in the prefix, not to the number of objects returned.
> Therefore, the implementation will probably list all the objects and directories, and simply filter out those that are "too deep" in the recursion hierarchy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)