You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "amoeba (via GitHub)" <gi...@apache.org> on 2023/03/10 00:58:20 UTC

[GitHub] [arrow] amoeba opened a new issue, #34525: [C++] GcsFileSystem either can't list a particular bucket or hangs indefinitely when doing so

amoeba opened a new issue, #34525:
URL: https://github.com/apache/arrow/issues/34525

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   As reported in a comment in another issue, https://github.com/apache/arrow/issues/33106#issuecomment-1378129897, GcsFileSystem is behaving in a surprising way for @cboettig's `neon-is-transition-output` bucket. This bucket is part of a [NEON/NSF/Google partnership](https://www.neonscience.org/data-samples/data-management/neon-google) so it's reasonable assume another user might run into this. Generally, I think any differences between non-Arrow GCS tools and Arrow's GCS tools may confuse users.
   
   Some high-level facts are:
   
   - The bucket is `neon-is-transition-output` and is non-public
   - Auth is being done via Service Account JSON credentials (other forms of auth aren't offered by the admins yet)
   - Objects in the bucket are deeply-nested and there are no objects at the root (just a single top level "folder")
   - Non-arrow GCS tools (rclone, Ruby's google-cloud-storage gem) work fine with this bucket
   - I've tried to replicate a similar bucket structure, aside from the sheer number of objects, in a bucket I control and I haven't been able to reproduce the behavior
   - The issue seems to be in the C++ implementation rather than any wrappers (R, Python)
   
   The behaviors I'm seeing are:
   
   - If you try to list objects at the root of the bucket you either get zero results and no error or you hang indefinitely (only when listing recursively)
   - The behavior entirely goes away when you list objects deep in the hierarchy. Obviously this isn't workable because you have to know the hierarchy
   
   **Listing objects at the root return no results:**
   
   ```r
   > library(arrow)
   > library(readr)
   > 
   > bucket <- gs_bucket(
   +     "neon-is-transition-output",
   +     json_credentials = readr::read_file("credentials.json"))
   > (bucket$ls())
   character(0)
   ```
   
   **Listing objects more specifically works fine:**
   
   ```r
   > library(arrow)
   > library(readr)
   > 
   > bucket <- arrow::gs_bucket(
   +     "neon-is-transition-output",
   +     json_credentials = readr::read_file("credentials.json")
   + )
   > bucket$ls("provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/")
    [1] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-01.avro"
    [2] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-02.avro"
    [3] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-03.avro"
    [4] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-04.avro"
    [5] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-05.avro"
    [6] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-06.avro"
    [7] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-07.avro"
    [8] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-08.avro"
    [9] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-09.avro"
   [10] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-10.avro"
   [11] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-11.avro"
   [12] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-12.avro"
   [13] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-13.avro"
   [14] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-14.avro"
   [15] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-15.avro"
   [16] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-16.avro"
   [17] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-17.avro"
   [18] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-18.avro"
   [19] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-19.avro"
   [20] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-20.avro"
   [21] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-21.avro"
   [22] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-22.avro"
   [23] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-23.avro"
   [24] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-24.avro"
   [25] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-25.avro"
   [26] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-26.avro"
   [27] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-27.avro"
   [28] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-28.avro"
   [29] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-29.avro"
   [30] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-30.avro"
   [31] "provisional/dpid=DP1.00001.001/ms=2016-03/site=DSNY/DSNY_L0_to_L1_2D_Wind_REVB_DP1.00001.001__2016-03-31.avro"
   ```
   
   I've put together [a short C++ program](https://github.com/amoeba/arrow-gcs-test/blob/main/gcs-test.cc) that replicates the behaviors I see in R and Python as a start.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org