You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/14 02:58:26 UTC

[GitHub] [arrow] westonpace commented on a diff in pull request #12701: ARROW-15892: [C++] Dataset APIs require s3:ListBucket Permissions

westonpace commented on code in PR #12701:
URL: https://github.com/apache/arrow/pull/12701#discussion_r850042491


##########
python/pyarrow/includes/libarrow_dataset.pxd:
##########
@@ -18,6 +18,7 @@
 # distutils: language = c++
 
 from libcpp.unordered_map cimport unordered_map
+from libcpp cimport bool

Review Comment:
   So this works ([it creates an alias for `bint` named `bool`](https://github.com/cython/cython/blob/d2d2ea33a21e1915a52790ef64fdd1f28867854c/Cython/Includes/libcpp/__init__.pxd#L2)) but elsewhere we either use `bint` directly or call it `c_bool` with: `from libcpp cimport bool as c_bool`.  Lets stick with one of the two existing approaches instead of adding a third.



##########
python/pyarrow/tests/conftest.py:
##########
@@ -311,3 +313,23 @@ def s3_server(s3_connection):
         finally:
             if proc is not None:
                 proc.kill()
+
+
+@pytest.fixture(scope='session')
+def limited_s3_user(request, s3_server):

Review Comment:
   Why is this method here instead of `util.py`?



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -4286,6 +4287,55 @@ def test_write_dataset_s3(s3_example_simple):
     assert result.equals(table)
 
 
+_minio_put_only_policy = """{
+    "Version": "2012-10-17",
+    "Statement": [
+        {
+            "Effect": "Allow",
+            "Action": [
+                "s3:PutObject",
+                "s3:ListBucket",
+                "s3:GetObjectVersion"
+            ],
+            "Resource": [
+                "arn:aws:s3:::*"
+            ]
+        }
+    ]
+}"""
+
+
+@pytest.mark.parquet
+@pytest.mark.s3
+def test_write_dataset_s3_put_only(s3_server, limited_s3_user):
+    from pyarrow.fs import S3FileSystem
+
+    # write dataset with s3 filesystem
+    host, port, _, _ = s3_server['connection']
+    fs = S3FileSystem(
+        access_key='limited',
+        secret_key='limited123',
+        endpoint_override='{}:{}'.format(host, port),
+        scheme='http'
+    )
+    limited_s3_user(_minio_put_only_policy)
+    table = pa.table([
+        pa.array(range(20)), pa.array(np.random.randn(20)),
+        pa.array(np.repeat(['a', 'b'], 10))],
+        names=["f1", "f2", "part"]
+    )
+    # writing with filesystem object with create_dir flag set to false

Review Comment:
   Can you add a check to make sure that writing the dataset with `create_dir=True` fails?  It doesn't really give us any coverage of our code but it allows us to make sure we've configured the limited server correctly so the test is more robust.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org