You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mark van der Broek (Jira)" <ji...@apache.org> on 2021/10/07 09:31:00 UTC

[jira] [Commented] (ARROW-13685) [C++] Cannot write dataset to S3FileSystem if bucket already exists

    [ https://issues.apache.org/jira/browse/ARROW-13685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425435#comment-17425435 ] 

Mark van der Broek commented on ARROW-13685:
--------------------------------------------

Not checking for existing buckets actually results in another issue if multiple processes try to create this bucket at exactly the same time.

 

I get the following error:

OSError: When creating bucket '<MY-BUCKET>': AWS Error [code 100]: Unable to parse ExceptionName: OperationAborted Message: A conflicting conditional operation is currently in progress against this resource. Please try again.

> [C++] Cannot write dataset to S3FileSystem if bucket already exists
> -------------------------------------------------------------------
>
>                 Key: ARROW-13685
>                 URL: https://issues.apache.org/jira/browse/ARROW-13685
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 5.0.0
>            Reporter: Caleb Overman
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 6.0.0
>
>          Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> I'm trying to write a parquet file to an existing S3 bucket using the new S3FileSystem interface. However, this is failing with an AWS Access Denied error (I do have necessary access). It appears to be trying to recreate the bucket which already exists.
> {code:java}
> import numpy as np
> import pyarrow as pa
> from pyarrow import fs
> import pyarrow.dataset as ds
> s3 = fs.S3FileSystem(region="us-west-2")
> table = pa.table({"a": range(10), "b": np.random.randn(10), "c": [1, 2] * 5})
> ds.write_dataset(
>     table,
>     "my-bucket/test.parquet",
>     format="parquet",
>     filesystem=s3,
> ){code}
> {code:java}
> OSError: When creating bucket 'my-bucket': AWS Error [code 15]: Access Denied
> {code}
> I'm seeing the same behavior using {{S3FileSystem.create_dir}} when {{recursive=True}}.
> {code:java}
> s3.create_dir("my-bucket/test_dir/", recursive=True) # Fails
> s3.create_dir("my-bucket/test_dir/", recursive=False) # Succeeds
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)