You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "adamreeve (via GitHub)" <gi...@apache.org> on 2024/03/04 00:52:27 UTC

[PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

adamreeve opened a new pull request, #40329:
URL: https://github.com/apache/arrow/pull/40329

   ### Rationale for this change
   
   See #40328
   
   ### What changes are included in this PR?
   
   Introduces a new wrapper class that holds a `FileKeyUnwrapper` and a shared pointer to a `KeyToolkit` so that the `FileKeyUnwrapper` remains usable after the `CryptoFactory` that created it is destroyed, and changes `CryptoFactory` to store a shared pointer to a `KeyToolkit` instead of storing a `KeyToolkit` inline.
   
   I'd be open to other suggestions on how to handle this, as this does feel like a slightly hacky workaround.
   
   I considered making the `FileKeyUnwrapper` constructors take a `std::shared_ptr<KeyToolkit>` instead of `KeyToolkit*`, but this caused issues with `KeyToolkit::RotateMasterKeys`, which needs to construct a `FileKeyUnwrapper` using a reference to itself. I could possibly have worked around this by making `KeyToolkit` inherit from `std::enable_shared_from_this`, but `KeyToolkit` is publicly exported so it would be difficult to ensure it's always managed by a shared pointer. `FileKeyUnwrapper` is also publicly exported so this would be a breaking change, although it's not clear that these classes would be used by many users directly, I'd imagine most users would only interact with a `CryptoFactory` instance.
   
   ### Are these changes tested?
   
   Yes, I've added a new unit test for this, although the test is currently skipped due to #40327.
   
   ### Are there any user-facing changes?
   
   This changes functionality for users of the C++ API in a non-breaking way, allowing them to use the `FileDecryptionProperties` returned from a `CryptoFactory` without having to keep the `CryptoFactory` alive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1525715601


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Sorry for bearing with me. Now I think I have fully understood your case.
   
   The patch itself LGTM. Though I'm not fully convinced this is a bug rather than an unexpected use case.
   
   cc @pitrou @emkornfield @mapleFU for more opinions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1510518291


##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -323,6 +323,36 @@ TEST_F(TestEncryptionKeyManagement, KeyRotationWithInternalMaterial) {
   EXPECT_THROW(this->RotateKeys(double_wrapping, encryption_no), ParquetException);
 }
 
+TEST_F(TestEncryptionKeyManagement, UsePropertiesAfterCrytoFactoryDestroyed) {
+  std::shared_ptr<KmsClientFactory> kms_client_factory =
+      std::make_shared<TestOnlyInMemoryKmsClientFactory>(true, key_list_);

Review Comment:
   I just noticed I'm using `wrap_locally_` lower down in this function to build the file name though, which doesn't make sense. So I've introduced a local `wrap_locally` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526012786


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   That's what I initially looked at doing, but this caused issues with `KeyToolkit::RotateMasterKeys`, which needs to construct a `FileKeyUnwrapper` using a reference to itself, and I couldn't find a better workaround. There are some more details on this problem in the PR description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526007386


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   I'm probably missing something, but why wouldn't change `FileKeyUnwrapper` to accept an owning pointer instead?



##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   I'm probably missing something, but why wouldn't we change `FileKeyUnwrapper` to accept an owning pointer instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1523960201


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   My comment might have been a bit unclear, I didn't mean to suggest that it's an issue that the decryption properties can be reused without cloning first. The comment here might clarify things:
   https://github.com/apache/arrow/blob/bd3fab4333f9e95680f5ed0cd931455e323e6e27/cpp/src/parquet/encryption/encryption.h#L365-L371
   
   When using `FileDecryptionProperties` from a `CryptoFactory`, the condition that "this object keeps the keyRetrieval callback only, and no explicit keys or aadPrefix" is met, so using `DeepClone` isn't necessary and one instance of `FileDecryptionProperties` can be used for multiple files, but I don't think that's a problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1525707563


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   No this isn't related to having access to the CryptoFactory instance, I think this discussion has gotten a bit sidetracked from the main point of the PR.
   
   My use case is probably a bit non-standard as I'm trying to provide a .NET binding to the encryption API without users having to deal with manual memory management. I have no control over the CryptoFactory lifetime so if users of ParquetSharp misuse the API they could get a segfault, which isn't usually expected in a managed language/runtime.
   
   This got me thinking, I wonder how pyarrow handles this. And it turns out pyarrow has the same issue. This code causes a hang on my machine, but works if I don't have the `del crypto_factory` line:
   
   ```python
   import base64
   import gc
   import pyarrow as pa
   import pyarrow.parquet as pq
   import pyarrow.parquet.encryption as pe
   
   
   class TestKmsClient(pe.KmsClient):
       """ Dummy KMS client that base64 encodes keys """
   
       def __init__(self, kms_connection_configuration):
           pe.KmsClient.__init__(self)
   
       def wrap_key(self, key_bytes, master_key_identifier):
           return f"{base64.b64encode(key_bytes).decode('utf-8')}"
   
       def unwrap_key(self, wrapped_key, master_key_identifier):
           return base64.b64decode(wrapped_key)
   
   
   # Create test data
   x = pa.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], type=pa.int32())
   y = pa.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], type=pa.float32())
   schema = pa.schema([
       pa.field("x", pa.int32(), nullable=False),
       pa.field("y", pa.float32(), nullable=False),
   ])
   table = pa.table([x, y], schema=schema)
   
   
   # Write encrypted file
   crypto_factory = pe.CryptoFactory(lambda config: TestKmsClient(config))
   config = pe.KmsConnectionConfig()
   encryption_config = pe.EncryptionConfiguration(
      footer_key="Key0",
      column_keys={
         "Key0": ["x", "y"],
      },
      double_wrapping=True,
      internal_key_material=True
   )
   encryption_properties = crypto_factory.file_encryption_properties(config, encryption_config)
   
   with pq.ParquetWriter("./encrypted_py.parquet", table.schema,
                        encryption_properties=encryption_properties) as writer:
      writer.write_table(table)
   
   # Read encrypted file
   decryption_config = pe.DecryptionConfiguration()
   decryption_properties = crypto_factory.file_decryption_properties(config, decryption_config)
   
   del crypto_factory
   gc.collect()
   
   with pq.ParquetFile("./encrypted_py.parquet",
                        decryption_properties=decryption_properties) as reader:
      read_table = reader.read()
   
   print(read_table)
   ```
   
   A similar example using a more realistic encrypted file caused a segfault.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1510540438


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   I don't follow what you're suggesting here sorry. I think the usual behaviour would be for users to generate new `FileDecryptionProperties` per-file, especially if external key material is used, as in that case the `FileKeyUnwrapper` holds a `FileKeyMaterialStore` that is file specific.
   
   But from from some brief testing, it looks like when internal key material is used, it is possible to reuse `FileDecryptionProperties` that have been obtained from a `CryptoFactory` with multiple files. The key encryption ids are randomly generated, so they are different between files and don't conflict in the cache stored in the `KeyToolkit`. It also looks like calling `DeepClone` is unnecessary, as the decryption properties use a `DecryptionKeyRetriever` rather than using the the `column_decryption_properties_` and `footer_key_` fields that get wiped after a read.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526001846


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Thanks for the clarification @adamreeve and thanks for the ping @wgtmac .
   
   1. I agree that we should avoid unowning pointers. as far as possible, except where performance-critical (which isn't the case here).
   2. If this fixes a crash in Python, then we should also add a Python test. Can you do that @adamreeve ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1532680079


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   That's a good point, I'll try that instead. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1510479145


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -28,9 +28,38 @@
 
 namespace parquet::encryption {
 
+namespace {
+
+/// Holds a FileKeyUnwrapper and shared pointer to a KeyToolkit to allow
+/// keeping the KeyToolkit alive alongside the FileKeyUnwrapper
+class CryptoFactoryFileKeyRetriever : public DecryptionKeyRetriever {
+ public:
+  CryptoFactoryFileKeyRetriever(
+      std::shared_ptr<KeyToolkit> key_toolkit,
+      const KmsConnectionConfig& kms_connection_config, double cache_lifetime_seconds,
+      const std::string& file_path = "",
+      const std::shared_ptr<::arrow::fs::FileSystem>& file_system = NULLPTR)
+      : file_key_unwrapper_(key_toolkit.get(), kms_connection_config,
+                            cache_lifetime_seconds, file_path, file_system) {
+    key_toolkit_ = key_toolkit;

Review Comment:
   nit: this can use std::move() instead.



##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   It looks like we hit the issue of member variable lifecycle in the FileDecryptionProperties class. I see there is already a `FileDecryptionProperties::DeepClone()` function. Should we expect the end user to explicitly clone it?



##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -28,9 +28,38 @@
 
 namespace parquet::encryption {
 
+namespace {
+
+/// Holds a FileKeyUnwrapper and shared pointer to a KeyToolkit to allow
+/// keeping the KeyToolkit alive alongside the FileKeyUnwrapper
+class CryptoFactoryFileKeyRetriever : public DecryptionKeyRetriever {
+ public:
+  CryptoFactoryFileKeyRetriever(
+      std::shared_ptr<KeyToolkit> key_toolkit,
+      const KmsConnectionConfig& kms_connection_config, double cache_lifetime_seconds,
+      const std::string& file_path = "",
+      const std::shared_ptr<::arrow::fs::FileSystem>& file_system = NULLPTR)
+      : file_key_unwrapper_(key_toolkit.get(), kms_connection_config,
+                            cache_lifetime_seconds, file_path, file_system) {
+    key_toolkit_ = key_toolkit;
+  }
+
+  std::string GetKey(const std::string& key_metadata) override {
+    return file_key_unwrapper_.GetKey(key_metadata);
+  }
+
+ private:
+  FileKeyUnwrapper file_key_unwrapper_;
+  std::shared_ptr<KeyToolkit> key_toolkit_;
+};
+
+}  // namespace
+
+CryptoFactory::CryptoFactory() { key_toolkit_ = std::make_shared<KeyToolkit>(); }

Review Comment:
   nit: move the assignment to initialization list, or simply initializing it in the header file to get rid of the explicit default constructor.



##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -323,6 +323,36 @@ TEST_F(TestEncryptionKeyManagement, KeyRotationWithInternalMaterial) {
   EXPECT_THROW(this->RotateKeys(double_wrapping, encryption_no), ParquetException);
 }
 
+TEST_F(TestEncryptionKeyManagement, UsePropertiesAfterCrytoFactoryDestroyed) {
+  std::shared_ptr<KmsClientFactory> kms_client_factory =
+      std::make_shared<TestOnlyInMemoryKmsClientFactory>(true, key_list_);

Review Comment:
   ```suggestion
         std::make_shared<TestOnlyInMemoryKmsClientFactory>(/*wrap_locally=*/true, key_list_);
   ```
   
   Or should we directly use `wrap_locally_`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1520846640


##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -324,6 +324,37 @@ TEST_F(TestEncryptionKeyManagement, KeyRotationWithInternalMaterial) {
   EXPECT_THROW(this->RotateKeys(double_wrapping, encryption_no), ParquetException);
 }
 
+TEST_F(TestEncryptionKeyManagement, UsePropertiesAfterCrytoFactoryDestroyed) {
+  constexpr bool wrap_locally = true;
+  std::shared_ptr<KmsClientFactory> kms_client_factory =
+      std::make_shared<TestOnlyInMemoryKmsClientFactory>(wrap_locally, key_list_);
+  std::shared_ptr<CryptoFactory> crypto_factory = std::make_shared<CryptoFactory>();
+  crypto_factory->RegisterKmsClientFactory(kms_client_factory);
+
+  constexpr bool double_wrapping = true;
+  constexpr bool internal_key_material = true;
+  constexpr int encryption_no = 0;
+
+  std::string file_name =
+      GetFileName(double_wrapping, wrap_locally, internal_key_material, encryption_no);
+  auto encryption_config =
+      GetEncryptionConfiguration(double_wrapping, internal_key_material, encryption_no);
+  auto decryption_config = GetDecryptionConfiguration();
+
+  auto file_encryption_properties = crypto_factory->GetFileEncryptionProperties(
+      kms_connection_config_, encryption_config);
+  auto file_decryption_properties = crypto_factory->GetFileDecryptionProperties(
+      kms_connection_config_, decryption_config);
+
+  // Free CryptoFactory
+  crypto_factory.reset();

Review Comment:
   Is it possible to add a real case where the issue happens?



##########
cpp/src/parquet/encryption/crypto_factory.h:
##########
@@ -148,7 +148,7 @@ class PARQUET_EXPORT CryptoFactory {
       int dek_length, const std::string& column_keys, FileKeyWrapper* key_wrapper);
 
   /// Key utilities object for kms client initialization and cache control
-  KeyToolkit key_toolkit_;
+  std::shared_ptr<KeyToolkit> key_toolkit_;

Review Comment:
   ```suggestion
     std::shared_ptr<KeyToolkit> key_toolkit_ = std::make_shared<KeyToolkit>();
   ```
   And remove the explicit ctor at line 103. WDYT?



##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -28,9 +28,36 @@
 
 namespace parquet::encryption {
 
+namespace {
+
+/// Holds a FileKeyUnwrapper and shared pointer to a KeyToolkit to allow
+/// keeping the KeyToolkit alive alongside the FileKeyUnwrapper
+class CryptoFactoryFileKeyRetriever : public DecryptionKeyRetriever {
+ public:
+  CryptoFactoryFileKeyRetriever(
+      std::shared_ptr<KeyToolkit> key_toolkit,
+      const KmsConnectionConfig& kms_connection_config, double cache_lifetime_seconds,
+      const std::string& file_path = "",

Review Comment:
   nit: please remove the default parameter



##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Thanks for the explanation! Could you point me to a real world code or test case having this issue if possible?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40329:
URL: https://github.com/apache/arrow/pull/40329#issuecomment-2012506001

   Revision: 7547c79b05e0657f6cd3961d01226c4fc84053af
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-67652940da](https://github.com/ursacomputing/crossbow/branches/all?query=actions-67652940da)
   
   |Task|Status|
   |----|------|
   |test-alpine-linux-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-alpine-linux-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376966024/job/22937898563)|
   |test-build-cpp-fuzz|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-build-cpp-fuzz)](https://github.com/ursacomputing/crossbow/actions/runs/8376965506/job/22937896626)|
   |test-conda-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-conda-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376966379/job/22937899930)|
   |test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-67652940da-azure-test-conda-cpp-valgrind)](https://github.com/ursacomputing/crossbow/runs/22937903427)|
   |test-cuda-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-cuda-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376965834/job/22937897938)|
   |test-debian-12-cpp-amd64|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-debian-12-cpp-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/8376965952/job/22937898402)|
   |test-debian-12-cpp-i386|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-debian-12-cpp-i386)](https://github.com/ursacomputing/crossbow/actions/runs/8376965566/job/22937897086)|
   |test-fedora-39-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-fedora-39-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376965411/job/22937896277)|
   |test-ubuntu-20.04-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-20.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376965326/job/22937896055)|
   |test-ubuntu-20.04-cpp-bundled|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-20.04-cpp-bundled)](https://github.com/ursacomputing/crossbow/actions/runs/8376965821/job/22937897901)|
   |test-ubuntu-20.04-cpp-minimal-with-formats|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-20.04-cpp-minimal-with-formats)](https://github.com/ursacomputing/crossbow/actions/runs/8376965680/job/22937897309)|
   |test-ubuntu-20.04-cpp-thread-sanitizer|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions/runs/8376965986/job/22937898433)|
   |test-ubuntu-22.04-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-22.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376965737/job/22937897654)|
   |test-ubuntu-22.04-cpp-20|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-22.04-cpp-20)](https://github.com/ursacomputing/crossbow/actions/runs/8376965515/job/22937896762)|
   |test-ubuntu-22.04-cpp-no-threading|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-22.04-cpp-no-threading)](https://github.com/ursacomputing/crossbow/actions/runs/8376965870/job/22937898029)|
   |test-ubuntu-24.04-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-24.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376966036/job/22937898712)|
   |test-ubuntu-24.04-cpp-gcc-14|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-67652940da-github-test-ubuntu-24.04-cpp-gcc-14)](https://github.com/ursacomputing/crossbow/actions/runs/8376965530/job/22937896783)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #40329:
URL: https://github.com/apache/arrow/pull/40329#issuecomment-2012496330

   @github-actions crossbow submit -g cpp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1525697683


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   CMIW, did you mean that in this case users cannot have access to the CryptoFactory instance? In the parquet dataset implementation, we always keep all the required instances in the config: https://github.com/apache/arrow/blob/main/cpp/src/arrow/dataset/parquet_encryption_config.h#L63



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526001846


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Thanks for the clarification @adamreeve and thanks for the ping @wgtmac .
   
   1. I agree that we should avoid unowning pointers, as far as possible, except where performance-critical (which isn't the case here).
   2. If this fixes a crash in Python, then we should also add a Python test. Can you do that @adamreeve ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526013945


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Yes sure, I will add a Python test for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526005401


##########
cpp/src/parquet/encryption/crypto_factory.h:
##########
@@ -148,7 +146,7 @@ class PARQUET_EXPORT CryptoFactory {
       int dek_length, const std::string& column_keys, FileKeyWrapper* key_wrapper);
 
   /// Key utilities object for kms client initialization and cache control
-  KeyToolkit key_toolkit_;
+  std::shared_ptr<KeyToolkit> key_toolkit_ = std::make_shared<KeyToolkit>();

Review Comment:
   Is it desirable for this to be default-constructed? I suppose concrete use requires a particular key toolkit, not a default-constructed one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #40329:
URL: https://github.com/apache/arrow/pull/40329#issuecomment-2013734830

   After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 54c4cedd45bdd9738288792b4813c97a1b82fe6c.
   
   There was 1 benchmark result indicating a performance regression:
   
   - Commit Run on `ursa-i9-9960x` at [2024-03-21 16:49:21Z](https://conbench.ursa.dev/compare/runs/dd02de0fb1f645a49d3dba5c097efbbb...3b58649a6da84b57835fbf88937f85a1/)
     - [`tpch` (R) with engine=arrow, format=parquet, language=R, memory_map=False, query_id=TPCH-08, scale_factor=1](https://conbench.ursa.dev/compare/benchmarks/065fbc062b3b7bc9800017ccf9697cc6...065fc7756bea7914800049a466a48783)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/22953988450) has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #40329:
URL: https://github.com/apache/arrow/pull/40329


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1523944660


##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -324,6 +324,37 @@ TEST_F(TestEncryptionKeyManagement, KeyRotationWithInternalMaterial) {
   EXPECT_THROW(this->RotateKeys(double_wrapping, encryption_no), ParquetException);
 }
 
+TEST_F(TestEncryptionKeyManagement, UsePropertiesAfterCrytoFactoryDestroyed) {
+  constexpr bool wrap_locally = true;
+  std::shared_ptr<KmsClientFactory> kms_client_factory =
+      std::make_shared<TestOnlyInMemoryKmsClientFactory>(wrap_locally, key_list_);
+  std::shared_ptr<CryptoFactory> crypto_factory = std::make_shared<CryptoFactory>();
+  crypto_factory->RegisterKmsClientFactory(kms_client_factory);
+
+  constexpr bool double_wrapping = true;
+  constexpr bool internal_key_material = true;
+  constexpr int encryption_no = 0;
+
+  std::string file_name =
+      GetFileName(double_wrapping, wrap_locally, internal_key_material, encryption_no);
+  auto encryption_config =
+      GetEncryptionConfiguration(double_wrapping, internal_key_material, encryption_no);
+  auto decryption_config = GetDecryptionConfiguration();
+
+  auto file_encryption_properties = crypto_factory->GetFileEncryptionProperties(
+      kms_connection_config_, encryption_config);
+  auto file_decryption_properties = crypto_factory->GetFileDecryptionProperties(
+      kms_connection_config_, decryption_config);
+
+  // Free CryptoFactory
+  crypto_factory.reset();

Review Comment:
   I've tried to make the test more realistic by creating the file encryption properties and decryption properties in methods that use a local crypto factory, which I think is the main scenario where this issue is likely to happen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1525713870


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   > My use case is probably a bit non-standard as I'm trying to provide a .NET binding to the encryption API without users having to deal with manual memory management.
   
   This is similar to the java binding via JNI where the lifetime management is tricky. Usually we provide a `releaseXXX` native call for users to explicitly release the resource managed by the native code. I'm not sure if you can do the similar thing.
   
   > This got me thinking, I wonder how pyarrow handles this. And it turns out pyarrow has the same issue.
   
   pyarrow users have access to the crypto_factory, so I don't think this is a problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1525823189


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Yes, users can work around this by holding a reference to the CryptoFactory. But I think it would be good to improve the API here so that users didn't have to worry about this, especially for users of managed languages/runtimes like C# and Python where they don't expect usage errors to lead to segfaults or undefined behaviour.
   
   It's fair to say this isn't exactly a bug, but I'd say that if this behaviour isn't changed then at the very least the Python API documentation for `CryptoFactory.file_decryption_properties` should be updated to warn users about the possibility of undefined behaviour if they don't keep their CryptoFactory alive as long as the decryption properties may be in use.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1532280405


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   You can have `FileKeyUnwrapper` optionally take an owning pointer, but also leave the non-owning variant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1532680079


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   That's a good point, I'll try that instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1533080648


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   I've changed this now, much nicer thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #40329:
URL: https://github.com/apache/arrow/pull/40329#issuecomment-2012443028

   @github-actions crossbow submit -g cpp
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1510502517


##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -323,6 +323,36 @@ TEST_F(TestEncryptionKeyManagement, KeyRotationWithInternalMaterial) {
   EXPECT_THROW(this->RotateKeys(double_wrapping, encryption_no), ParquetException);
 }
 
+TEST_F(TestEncryptionKeyManagement, UsePropertiesAfterCrytoFactoryDestroyed) {
+  std::shared_ptr<KmsClientFactory> kms_client_factory =
+      std::make_shared<TestOnlyInMemoryKmsClientFactory>(true, key_list_);

Review Comment:
   `wrap_locally_` is only initialized by `SetupCryptoFactory`, which I'm purposely not using in this test due to using a local CryptoFactory instead, so your suggested change makes sense I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526013945


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   Yes sure, I will add a Python test for this, probably next week.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526020211


##########
cpp/src/parquet/encryption/crypto_factory.h:
##########
@@ -148,7 +146,7 @@ class PARQUET_EXPORT CryptoFactory {
       int dek_length, const std::string& column_keys, FileKeyWrapper* key_wrapper);
 
   /// Key utilities object for kms client initialization and cache control
-  KeyToolkit key_toolkit_;
+  std::shared_ptr<KeyToolkit> key_toolkit_ = std::make_shared<KeyToolkit>();

Review Comment:
   I think this is fine. The `key_toolkit_` is always left as the default constructed value, and it is initialized when `CryptoFactory::RegisterKmsClientFactory` is called, which in turn calls `key_toolkit_->RegisterKmsClientFactory`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1527653161


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +201,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,
+  auto key_retriever = std::make_shared<CryptoFactoryFileKeyRetriever>(

Review Comment:
   I've added a Python test for this now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "adamreeve (via GitHub)" <gi...@apache.org>.
adamreeve commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1526029254


##########
cpp/src/parquet/encryption/crypto_factory.cc:
##########
@@ -172,8 +199,8 @@ std::shared_ptr<FileDecryptionProperties> CryptoFactory::GetFileDecryptionProper
     const KmsConnectionConfig& kms_connection_config,
     const DecryptionConfiguration& decryption_config, const std::string& file_path,
     const std::shared_ptr<::arrow::fs::FileSystem>& file_system) {
-  auto key_retriever = std::make_shared<FileKeyUnwrapper>(
-      &key_toolkit_, kms_connection_config, decryption_config.cache_lifetime_seconds,

Review Comment:
   Maybe this could work if `KeyToolkit` could inherit from `std::enable_shared_from_this`, and we remove the public `KeyTookit` constructor and enforce that it can only be created inside a `shared_ptr`? I'd be concerned that this could be an annoying breaking change for some users though if anyone is using `KeyToolkit` directly rather than via a `CryptoFactory`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40329:
URL: https://github.com/apache/arrow/pull/40329#issuecomment-2012454361

   Revision: 387e1f2c53f3be1a415d143e74645089ff550af5
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-2e333fc810](https://github.com/ursacomputing/crossbow/branches/all?query=actions-2e333fc810)
   
   |Task|Status|
   |----|------|
   |test-alpine-linux-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-alpine-linux-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376677621/job/22936909362)|
   |test-build-cpp-fuzz|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-build-cpp-fuzz)](https://github.com/ursacomputing/crossbow/actions/runs/8376677158/job/22936907395)|
   |test-conda-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-conda-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376677273/job/22936908449)|
   |test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-2e333fc810-azure-test-conda-cpp-valgrind)](https://github.com/ursacomputing/crossbow/runs/22936910058)|
   |test-cuda-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-cuda-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376677659/job/22936909287)|
   |test-debian-12-cpp-amd64|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-debian-12-cpp-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/8376677202/job/22936907662)|
   |test-debian-12-cpp-i386|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-debian-12-cpp-i386)](https://github.com/ursacomputing/crossbow/actions/runs/8376677034/job/22936907026)|
   |test-fedora-39-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-fedora-39-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376676896/job/22936906495)|
   |test-ubuntu-20.04-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-20.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376677210/job/22936907677)|
   |test-ubuntu-20.04-cpp-bundled|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-20.04-cpp-bundled)](https://github.com/ursacomputing/crossbow/actions/runs/8376677439/job/22936908610)|
   |test-ubuntu-20.04-cpp-minimal-with-formats|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-20.04-cpp-minimal-with-formats)](https://github.com/ursacomputing/crossbow/actions/runs/8376677114/job/22936907386)|
   |test-ubuntu-20.04-cpp-thread-sanitizer|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions/runs/8376677146/job/22936907394)|
   |test-ubuntu-22.04-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-22.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376677218/job/22936907703)|
   |test-ubuntu-22.04-cpp-20|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-22.04-cpp-20)](https://github.com/ursacomputing/crossbow/actions/runs/8376677222/job/22936907871)|
   |test-ubuntu-22.04-cpp-no-threading|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-22.04-cpp-no-threading)](https://github.com/ursacomputing/crossbow/actions/runs/8376677387/job/22936908774)|
   |test-ubuntu-24.04-cpp|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-24.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/8376677026/job/22936907074)|
   |test-ubuntu-24.04-cpp-gcc-14|[![GitHub Actions](https://github.com/ursacomputing/crossbow/actions/workflows/crossbow.yml/badge.svg?branch=actions-2e333fc810-github-test-ubuntu-24.04-cpp-gcc-14)](https://github.com/ursacomputing/crossbow/actions/runs/8376677362/job/22936908552)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #40329:
URL: https://github.com/apache/arrow/pull/40329#discussion_r1534125782


##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -193,6 +193,26 @@ class TestEncryptionKeyManagement : public ::testing::Test {
     TestOnlyInServerWrapKms::FinishKeyRotation();
     crypto_factory_.RemoveCacheEntriesForAllTokens();
   }
+
+  // Create encryption properties without keeping the creating CryptoFactory alive
+  std::shared_ptr<FileEncryptionProperties> GetOrphanedFileEncryptionProperties(
+      std::shared_ptr<KmsClientFactory> kms_client_factory,
+      EncryptionConfiguration encryption_config) {
+    std::shared_ptr<CryptoFactory> crypto_factory = std::make_shared<CryptoFactory>();
+    crypto_factory->RegisterKmsClientFactory(kms_client_factory);

Review Comment:
   Nit: should we use std::move for kms_client_factory?



##########
cpp/src/parquet/encryption/key_management_test.cc:
##########
@@ -193,6 +193,26 @@ class TestEncryptionKeyManagement : public ::testing::Test {
     TestOnlyInServerWrapKms::FinishKeyRotation();
     crypto_factory_.RemoveCacheEntriesForAllTokens();
   }
+
+  // Create encryption properties without keeping the creating CryptoFactory alive
+  std::shared_ptr<FileEncryptionProperties> GetOrphanedFileEncryptionProperties(
+      std::shared_ptr<KmsClientFactory> kms_client_factory,
+      EncryptionConfiguration encryption_config) {
+    std::shared_ptr<CryptoFactory> crypto_factory = std::make_shared<CryptoFactory>();
+    crypto_factory->RegisterKmsClientFactory(kms_client_factory);
+    return crypto_factory->GetFileEncryptionProperties(kms_connection_config_,
+                                                       encryption_config);
+  }
+
+  // Create decryption properties without keeping the creating CryptoFactory alive
+  std::shared_ptr<FileDecryptionProperties> GetOrphanedFileDecryptionProperties(
+      std::shared_ptr<KmsClientFactory> kms_client_factory,
+      DecryptionConfiguration decryption_config) {
+    std::shared_ptr<CryptoFactory> crypto_factory = std::make_shared<CryptoFactory>();
+    crypto_factory->RegisterKmsClientFactory(kms_client_factory);

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-40328: [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #40329:
URL: https://github.com/apache/arrow/pull/40329#issuecomment-2012534460

   The duckdb test failures on R builds are unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org