You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/07/11 20:34:00 UTC

[jira] [Updated] (ARROW-17045) [C++] GCS doesn't drop ending slash for files

     [ https://issues.apache.org/jira/browse/ARROW-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Jones updated ARROW-17045:
-------------------------------
    Description: 
There is inconsistent behavior between GCS and S3 when it comes to creating files. I'm still not sure yet whether this is an implementation difference or difference between minio and GCS testbench.

Example:

{code:python}
import pyarrow.fs
from pyarrow.fs import FileSelector
from datetime import timedelta

gcs = pyarrow.fs.GcsFileSystem(
    endpoint_override="localhost:9001",
    scheme="http",
    anonymous=True,
    retry_time_limit=timedelta(seconds=1),
)

gcs.create_dir("py_test")
with gcs.open_output_stream("py_test/test.txt") as out_stream:
    out_stream.write(b"Hello world!")

with gcs.open_output_stream("py_test/test.txt/") as out_stream:
    out_stream.write(b"Hello world!")

gcs.get_file_info(FileSelector("py_test"))
# [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo for 'py_test/test.txt': type=FileType.Directory>]

s3 = pyarrow.fs.S3FileSystem(
    access_key="minioadmin",
    secret_key="minioadmin",
    scheme="http",
    endpoint_override="localhost:9000",
    allow_bucket_creation=True,
    allow_bucket_deletion=True,
)

s3.create_dir("py-test")
with s3.open_output_stream("py-test/test.txt") as out_stream:
    out_stream.write(b"Hello world!")
with s3.open_output_stream("py-test/test.txt/") as out_stream:
    out_stream.write(b"Hello world!")

s3.get_file_info(FileSelector("py-test"))
# [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>]
{code}

  was:
There is inconsistent behavior between GCS and S3 when it comes to creating files.

Example:

{code:python}
import pyarrow.fs
from pyarrow.fs import FileSelector
from datetime import timedelta

gcs = pyarrow.fs.GcsFileSystem(
    endpoint_override="localhost:9001",
    scheme="http",
    anonymous=True,
    retry_time_limit=timedelta(seconds=1),
)

gcs.create_dir("py_test")
with gcs.open_output_stream("py_test/test.txt") as out_stream:
    out_stream.write(b"Hello world!")

with gcs.open_output_stream("py_test/test.txt/") as out_stream:
    out_stream.write(b"Hello world!")

gcs.get_file_info(FileSelector("py_test"))
# [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo for 'py_test/test.txt': type=FileType.Directory>]

s3 = pyarrow.fs.S3FileSystem(
    access_key="minioadmin",
    secret_key="minioadmin",
    scheme="http",
    endpoint_override="localhost:9000",
    allow_bucket_creation=True,
    allow_bucket_deletion=True,
)

s3.create_dir("py-test")
with s3.open_output_stream("py-test/test.txt") as out_stream:
    out_stream.write(b"Hello world!")
with s3.open_output_stream("py-test/test.txt/") as out_stream:
    out_stream.write(b"Hello world!")

s3.get_file_info(FileSelector("py-test"))
# [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>]
{code}


> [C++] GCS doesn't drop ending slash for files
> ---------------------------------------------
>
>                 Key: ARROW-17045
>                 URL: https://issues.apache.org/jira/browse/ARROW-17045
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Critical
>             Fix For: 9.0.0
>
>
> There is inconsistent behavior between GCS and S3 when it comes to creating files. I'm still not sure yet whether this is an implementation difference or difference between minio and GCS testbench.
> Example:
> {code:python}
> import pyarrow.fs
> from pyarrow.fs import FileSelector
> from datetime import timedelta
> gcs = pyarrow.fs.GcsFileSystem(
>     endpoint_override="localhost:9001",
>     scheme="http",
>     anonymous=True,
>     retry_time_limit=timedelta(seconds=1),
> )
> gcs.create_dir("py_test")
> with gcs.open_output_stream("py_test/test.txt") as out_stream:
>     out_stream.write(b"Hello world!")
> with gcs.open_output_stream("py_test/test.txt/") as out_stream:
>     out_stream.write(b"Hello world!")
> gcs.get_file_info(FileSelector("py_test"))
> # [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo for 'py_test/test.txt': type=FileType.Directory>]
> s3 = pyarrow.fs.S3FileSystem(
>     access_key="minioadmin",
>     secret_key="minioadmin",
>     scheme="http",
>     endpoint_override="localhost:9000",
>     allow_bucket_creation=True,
>     allow_bucket_deletion=True,
> )
> s3.create_dir("py-test")
> with s3.open_output_stream("py-test/test.txt") as out_stream:
>     out_stream.write(b"Hello world!")
> with s3.open_output_stream("py-test/test.txt/") as out_stream:
>     out_stream.write(b"Hello world!")
> s3.get_file_info(FileSelector("py-test"))
> # [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)