You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/21 19:34:56 UTC

[GitHub] [arrow] wjones127 commented on pull request #13681: ARROW-16818: [Doc][Python] Document GCS filesystem for PyArrow

wjones127 commented on PR #13681:
URL: https://github.com/apache/arrow/pull/13681#issuecomment-1191859530

   Based on my experience with the R docs, here a few specific things we should note:
   
    * The default value for `retry_time_limit` is 15 minutes, but users may often wish to lower this to more like 15 seconds. This is especially important to note because of https://issues.apache.org/jira/browse/ARROW-17020
    * To connect to public buckets, you *must* pass `anonymous=True`. See https://issues.apache.org/jira/browse/ARROW-17069
    * Unlike `S3FileSystem`, `GcsFileSystem` will only return directories in `GetFileInfo` if there are special directory markers created by Arrow. Basically this means that if a directory structure was created by some mechanism other than a filesystem (as is the case in our `voltrondata-labs-datasets` bucket), the "directories" will be invisible. Users are therefore encouraged to always list with `recursive=True` in `GcsFileSystem`. See https://issues.apache.org/jira/browse/ARROW-17020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org