You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 15:03:47 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #10450: ARROW-9947: [Python] High-level Python API for Parquet encryption of files.

pitrou commented on a change in pull request #10450:
URL: https://github.com/apache/arrow/pull/10450#discussion_r803753924



##########
File path: docs/source/python/parquet.rst
##########
@@ -48,9 +48,9 @@ support bundled:
    import pyarrow.parquet as pq
 
 If you are building ``pyarrow`` from source, you must use
-``-DARROW_PARQUET=ON`` when compiling the C++ libraries and enable the Parquet
-extensions when building ``pyarrow``. See the :ref:`Python Development
-<python-development>` page for more details.
+``-DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON`` when compiling the C++

Review comment:
       This is a bit misleading. "Must use" applies to `-DARROW_PARQUET=ON` but encryption is only needed if people actually want to encrypt/decrypt files.

##########
File path: docs/source/python/api/formats.rst
##########
@@ -88,6 +88,11 @@ Parquet Files
    write_metadata
    write_table
    write_to_dataset
+   CryptoFactory

Review comment:
       These would deserve a better categorization than just appending them to this list. Perhaps add a sub-heading?

##########
File path: python/pyarrow/tests/parquet/common_parquet_encryption.py
##########
@@ -0,0 +1,63 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import base64
+
+try:
+    import pyarrow.parquet_encryption as pe
+except ImportError:
+    pe = None

Review comment:
       This is not useful since it will error out just below (because of `pe.KmsClient`).

##########
File path: python/examples/parquet_encryption/sample_vault_kms_client.py
##########
@@ -0,0 +1,161 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""A sample KmsClient implementation."""
+import argparse
+import base64
+import os
+
+import requests
+
+import pyarrow as pa
+try:
+    import pyarrow.parquet as pq
+except ImportError:
+    pq = None
+
+
+class VaultClient(pq.KmsClient):

Review comment:
       Does this actually work? `KmsClient` is in `pyarrow.parquet_encryption`, not `pyarrow.parquet`, right?

##########
File path: docs/source/developers/python.rst
##########
@@ -486,6 +492,7 @@ Now, we can build pyarrow:
 
    pushd arrow\python
    set PYARROW_WITH_PARQUET=1
+   set PYARROW_WITH_PARQUET_ENCRYPTION=1

Review comment:
       Similar comment here and below.

##########
File path: docs/source/developers/python.rst
##########
@@ -475,6 +480,7 @@ Let's configure, build and install the Arrow C++ libraries:
        -DARROW_WITH_ZLIB=on ^
        -DARROW_WITH_ZSTD=on ^
        -DARROW_PARQUET=on ^
+       -DPARQUET_REQUIRE_ENCRYPTION=on ^

Review comment:
       I'm not convinced it is useful to add this here. Most people developing on PyArrow aren't concerned with encryption.

##########
File path: python/examples/parquet_encryption/sample_vault_kms_client.py
##########
@@ -0,0 +1,161 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""A sample KmsClient implementation."""
+import argparse
+import base64
+import os
+
+import requests
+
+import pyarrow as pa
+try:
+    import pyarrow.parquet as pq
+except ImportError:
+    pq = None

Review comment:
       This is not useful since it will error out just below (because of `pq.KmsClient`).

##########
File path: python/asv-build.sh
##########
@@ -68,6 +69,7 @@ export PYARROW_PARALLEL=8
 export PYARROW_WITH_FLIGHT=1
 export PYARROW_WITH_ORC=1
 export PYARROW_WITH_PARQUET=1
+export PYARROW_WITH_PARQUET_ENCRYPTION=1

Review comment:
       There is no need to enable encryption here.

##########
File path: python/examples/minimal_build/build_conda.sh
##########
@@ -105,6 +106,7 @@ export PYARROW_BUILD_TYPE=Debug
 export PYARROW_CMAKE_GENERATOR=Ninja
 export PYARROW_WITH_FLIGHT=1
 export PYARROW_WITH_PARQUET=1
+export PYARROW_WITH_PARQUET_ENCRYPTION=1

Review comment:
       Similarly, I don't think we want to enable encryption on a minimal build.

##########
File path: python/pyarrow/parquet.py
##########
@@ -1933,7 +1950,10 @@ def partitioning(self):
     resolution (e.g. 'ms'). Setting to None is equivalent to 'ns'
     and therefore INT96 timestamps will be infered as timestamps
     in nanoseconds.
-
+decryption_properties : FileDecryptionProperties or None
+    File-level decryption properties.
+    The decryption properties can be created using
+    ``CryptoFactory.file_decryption_properties()``.
 Returns

Review comment:
       Add a line break here:
   ```suggestion
       ``CryptoFactory.file_decryption_properties()``.
   
   Returns
   ```

##########
File path: docs/source/developers/python.rst
##########
@@ -348,6 +350,9 @@ Now, build pyarrow:
 If you did build one of the optional components (in C++), you need to set the
 corresponding ``PYARROW_WITH_$COMPONENT`` environment variable to 1.
 
+If you built with ``PARQUET_REQUIRE_ENCRYPTION`` (in C++), you need to set the
+corresponding ``PYARROW_WITH_PARQUET_ENCRYPTION`` environment variable to 1.

Review comment:
       ```suggestion
   Similarly, if you built with ``PARQUET_REQUIRE_ENCRYPTION`` (in C++), you need to set the
   corresponding ``PYARROW_WITH_PARQUET_ENCRYPTION`` environment variable to 1.
   ```

##########
File path: python/examples/minimal_build/build_venv.sh
##########
@@ -54,6 +54,7 @@ cmake -GNinja \
       -DARROW_WITH_SNAPPY=ON \
       -DARROW_WITH_BROTLI=ON \
       -DARROW_PARQUET=ON \
+      -DPARQUET_REQUIRE_ENCRYPTION=ON \

Review comment:
       I would expect this change to be reverted, so as to keep the "minimal" aspect.

##########
File path: python/pyarrow/tests/parquet/test_parquet_encryption.py
##########
@@ -0,0 +1,541 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import pytest
+from datetime import timedelta
+

Review comment:
       So, for the record, these tests fail if encryption is not enabled. I would expect them to be skipped instead:
   ```
   pyarrow/tests/parquet/test_parquet_encryption.py FFFFFFEEEEFFFxxs        [100%]
   
   ==================================== ERRORS ====================================
   ___________ ERROR at setup of test_encrypted_parquet_write_kms_error ___________
   Traceback (most recent call last):
     File "/home/antoine/arrow/dev/python/pyarrow/tests/parquet/test_parquet_encryption.py", line 51, in basic_encryption_config
       basic_encryption_config = pe.EncryptionConfiguration(
   AttributeError: 'NoneType' object has no attribute 'EncryptionConfiguration'
   
   [etc.]
   ```

##########
File path: python/pyarrow/tests/parquet/test_parquet_encryption.py
##########
@@ -0,0 +1,541 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import pytest
+from datetime import timedelta
+

Review comment:
       If you need help with that, please say so and I can take a look.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org