You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2022/11/09 19:55:23 UTC

[GitHub] [parquet-mr] parthchandra opened a new pull request, #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

parthchandra opened a new pull request, #1008:
URL: https://github.com/apache/parquet-mr/pull/1008

   The PR adds the new ByteBuffer api and also updates ColumnChunkPageReadStore.readPage to use the new API.
   A few additional classes were touched (ParquetReader.Builder, BytesInput) to allow an allocator to be specified and/or to avoid ByteBuffer -> byte array copying. These changes were necessary to enable the unit test.
   A user option has been added to explicitly enable/disable the use of the ByteBuffer api for decryption.
   ### Jira
   
   -  My PR addresses t [Parquet 2212](https://issues.apache.org/jira/browse/PARQUET-2212) 
   
   ### Tests
   
   - Updates Unit test(s) in `org.apache.parquet.crypto.TestPropertiesDrivenEncryption`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1022648985


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();
+            BytesInput decompressed;
+
+            if (byteBuffer.isDirect() && options.useOffHeapDecryptBuffer()) {
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  ByteBuffer.allocateDirect(dataPageV1.getUncompressedSize());
+              decompressor.decompress(byteBuffer, (int) compressedSize, decompressedBuffer,

Review Comment:
   SGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
shangxinli commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1038820534


##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +44,8 @@ public class ParquetReadOptions {
   private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
   private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
   private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+  // Default to true if JDK 17 or newer.

Review Comment:
   Don't quite understand this comment. Where it is set to true? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1504398847

   @parthchandra Do you want to revive this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1309294080

   @ggershinsky your review will be appreciated! 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1039958501


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            BytesInput decompressed;
+
+            if (options.getAllocator().isDirect() && options.useOffHeapDecryptBuffer()) {
+              ByteBuffer byteBuffer = bytes.toByteBuffer();
+              if (!byteBuffer.isDirect()) {
+                throw new ParquetDecodingException("Expected a direct buffer");
+              }
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  options.getAllocator().allocate(dataPageV1.getUncompressedSize());
+              decompressor.decompress(byteBuffer, (int) compressedSize, decompressedBuffer,
+                  dataPageV1.getUncompressedSize());
+
+              // HACKY: sometimes we need to do `flip` because the position of output bytebuffer is

Review Comment:
   The output buffer is set, but the position is not reset after the call to some direct buffer decompressors. (Not clear to me where in the direct decompression it happens; it might be worth looking into). It is safe (and not expensive) to call flip.



##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +44,8 @@ public class ParquetReadOptions {
   private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
   private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
   private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+  // Default to true if JDK 17 or newer.

Review Comment:
   Oops. Comment got left behind from the original. I changed the initialization after some review comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1021580279


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();
+            BytesInput decompressed;
+
+            if (byteBuffer.isDirect() && options.useOffHeapDecryptBuffer()) {
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  ByteBuffer.allocateDirect(dataPageV1.getUncompressedSize());
+              decompressor.decompress(byteBuffer, (int) compressedSize, decompressedBuffer,

Review Comment:
   Looks like this PR goes beyond decryption, and adds direct buffers to decompression too. Please update the title (or split the PR in two).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1021582761


##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +46,9 @@ public class ParquetReadOptions {
   private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
   private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
   private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+  // Default to true if JDK 17 or newer.
+  private static final boolean USE_OFF_HEAP_DECRYPT_BUFFER_DEFAULT =
+    SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17);

Review Comment:
   Will direct buffers have better performance than byte arrays or indirect byte buffers? Or maybe the default can be off; with "on" to be set explicitly by users?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1310268825

   sure, will be glad to review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1022637484


##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +46,9 @@ public class ParquetReadOptions {
   private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
   private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
   private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+  // Default to true if JDK 17 or newer.
+  private static final boolean USE_OFF_HEAP_DECRYPT_BUFFER_DEFAULT =
+    SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17);

Review Comment:
   Yep, lets do it. Given the current state of byte buffer decryption in Java (not all problems are resolved in JVM17), it'd be safer to keep this off, and let the users set to on explicitly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli merged pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "shangxinli (via GitHub)" <gi...@apache.org>.
shangxinli merged PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1381826030

   CI is failing at the pre-build step. Anyone know why?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
shangxinli commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1038821325


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            BytesInput decompressed;
+
+            if (options.getAllocator().isDirect() && options.useOffHeapDecryptBuffer()) {
+              ByteBuffer byteBuffer = bytes.toByteBuffer();
+              if (!byteBuffer.isDirect()) {
+                throw new ParquetDecodingException("Expected a direct buffer");
+              }
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  options.getAllocator().allocate(dataPageV1.getUncompressedSize());
+              decompressor.decompress(byteBuffer, (int) compressedSize, decompressedBuffer,
+                  dataPageV1.getUncompressedSize());
+
+              // HACKY: sometimes we need to do `flip` because the position of output bytebuffer is

Review Comment:
   Do we know in what scenario the output byte buffer is not set? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1507838674

   @parthchandra Thanks for updating it!
   
   @ggershinsky @shangxinli Do you have any comments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "shangxinli (via GitHub)" <gi...@apache.org>.
shangxinli commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1559507176

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1506055435

   Sure. What does one need to do? 
   I believe all the comments are addressed and CI seems to be failing for unrelated reasons (is there a way to re-trigger the failed tests?). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1023355113


##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +46,9 @@ public class ParquetReadOptions {
   private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
   private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
   private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+  // Default to true if JDK 17 or newer.
+  private static final boolean USE_OFF_HEAP_DECRYPT_BUFFER_DEFAULT =
+    SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17);

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
wgtmac commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1376928785

   > @wgtmac Do you have time to have a look?
   
   @shangxinli Thanks for mentioning me. Sure, I will take a look this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1021927002


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();
+            BytesInput decompressed;
+
+            if (byteBuffer.isDirect() && options.useOffHeapDecryptBuffer()) {
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  ByteBuffer.allocateDirect(dataPageV1.getUncompressedSize());
+              decompressor.decompress(byteBuffer, (int) compressedSize, decompressedBuffer,

Review Comment:
   Not really adding direct buffer support for decompresson. That is already implemented ((see `FullDirectDecompressor`).
   For this path only (ie direct buffer and direct buffer decryption is enabled), the buffer returned by decrypt is a direct byte buffer. In this case, it is efficient to call the direct byte buffer decompress api which is already available.



##########
parquet-hadoop/src/test/java/org/apache/parquet/crypto/TestPropertiesDrivenEncryption.java:
##########
@@ -196,13 +205,13 @@ public static Collection<Object[]> data() {
     .append(COLUMN_MASTER_KEY_IDS[5]).append(": ").append(SingleRow.FIXED_LENGTH_BINARY_FIELD_NAME)
     .toString();
 
-  private static final int NUM_THREADS = 4;
+  private static final int NUM_THREADS = 1;

Review Comment:
   Oops. My mistake. 



##########
parquet-hadoop/src/main/java/org/apache/parquet/ParquetReadOptions.java:
##########
@@ -44,6 +46,9 @@ public class ParquetReadOptions {
   private static final int ALLOCATION_SIZE_DEFAULT = 8388608; // 8MB
   private static final boolean PAGE_VERIFY_CHECKSUM_ENABLED_DEFAULT = false;
   private static final boolean BLOOM_FILTER_ENABLED_DEFAULT = true;
+  // Default to true if JDK 17 or newer.
+  private static final boolean USE_OFF_HEAP_DECRYPT_BUFFER_DEFAULT =
+    SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_17);

Review Comment:
   It looks like ByteBuffers do not have hardware acceleration enabled until JDK17. So, if and only if, the file read buffer is  a direct byte buffer and JDK is greater than 17, decryption will be as fast as byte arrays, but we will avoid the copy from a direct byte buffer to a byte array in `ColumnChunkPageReadStore`
   I can default to false if you prefer.
   



##########
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java:
##########
@@ -51,17 +52,26 @@
      * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
      * @return plaintext - starts at offset 0 of the output value, and fills up the entire byte array.
      */
-    public byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
+    byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
 
     /**
+     * Convenience decryption method that reads the length and ciphertext from a ByteBuffer
+     *
+     * @param from ByteBuffer with length and ciphertext.
+     * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
+     * @return plaintext -  starts at offset 0 of the output, and fills up the entire byte array.

Review Comment:
   Done



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();

Review Comment:
   Done.



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();
+            BytesInput decompressed;
+
+            if (byteBuffer.isDirect() && options.useOffHeapDecryptBuffer()) {
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  ByteBuffer.allocateDirect(dataPageV1.getUncompressedSize());

Review Comment:
   Hmm. Using a buffer pool here is a major PR by itself. The big problem is managing the lifecycle of the allocated buffer. The allocated buffer is passed on to the caller which can then keep a reference as long as it wants. Somewhere, we have to guarantee that the buffer is returned to the pool when no longer needed, but we cannot do that here.
   I've changed this to use the allocator allocator specified in the ParquetReadOptions (which I should have in the first place). The allocator api has a `release` method which the caller needs to call. The provided allocator is the best place for the buffer pool implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1318625808

   Thank you. This PR is significant enough to be reviewed by more than person. Cc @shangxinli 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1319405608

   Thank you @ggershinsky  !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
shangxinli commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1065346689


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            BytesInput decompressed;
+
+            if (options.getAllocator().isDirect() && options.useOffHeapDecryptBuffer()) {
+              ByteBuffer byteBuffer = bytes.toByteBuffer();
+              if (!byteBuffer.isDirect()) {
+                throw new ParquetDecodingException("Expected a direct buffer");
+              }
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  options.getAllocator().allocate(dataPageV1.getUncompressedSize());
+              decompressor.decompress(byteBuffer, (int) compressedSize, decompressedBuffer,
+                  dataPageV1.getUncompressedSize());
+
+              // HACKY: sometimes we need to do `flip` because the position of output bytebuffer is

Review Comment:
   Ok  I see



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1506231596

   > Sure. What does one need to do? I believe all the comments are addressed and CI seems to be failing for unrelated reasons (is there a way to re-trigger the failed tests?).
   
   Let's try to rebase and force push it to see if the CIs are green then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1507327789

   Force pushed after rebase to retrigger ci


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1021589628


##########
parquet-hadoop/src/test/java/org/apache/parquet/crypto/TestPropertiesDrivenEncryption.java:
##########
@@ -196,13 +205,13 @@ public static Collection<Object[]> data() {
     .append(COLUMN_MASTER_KEY_IDS[5]).append(": ").append(SingleRow.FIXED_LENGTH_BINARY_FIELD_NAME)
     .toString();
 
-  private static final int NUM_THREADS = 4;
+  private static final int NUM_THREADS = 1;

Review Comment:
   why removing multiple threads?



##########
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java:
##########
@@ -51,17 +52,26 @@
      * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
      * @return plaintext - starts at offset 0 of the output value, and fills up the entire byte array.
      */
-    public byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
+    byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
 
     /**
+     * Convenience decryption method that reads the length and ciphertext from a ByteBuffer
+     *
+     * @param from ByteBuffer with length and ciphertext.
+     * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
+     * @return plaintext -  starts at offset 0 of the output, and fills up the entire byte array.

Review Comment:
   nit: please update for byte buffer



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();

Review Comment:
   can you call this after "if(options.useOffHeapDecryptBuffer())"



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,33 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            ByteBuffer byteBuffer = bytes.toByteBuffer();
+            BytesInput decompressed;
+
+            if (byteBuffer.isDirect() && options.useOffHeapDecryptBuffer()) {
+              if (blockDecryptor != null) {
+                byteBuffer = blockDecryptor.decrypt(byteBuffer, dataPageAAD);
+              }
+              long compressedSize = byteBuffer.limit();
+
+              ByteBuffer decompressedBuffer =
+                  ByteBuffer.allocateDirect(dataPageV1.getUncompressedSize());

Review Comment:
   this is expensive.. Can we use a buffer pool here, or other means to reduce the allocations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] wgtmac commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
wgtmac commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1067699349


##########
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java:
##########
@@ -51,17 +52,26 @@
      * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
      * @return plaintext - starts at offset 0 of the output value, and fills up the entire byte array.
      */
-    public byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
+    byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
 
     /**
+     * Convenience decryption method that reads the length and ciphertext from a ByteBuffer
+     *
+     * @param from ByteBuffer with length and ciphertext.
+     * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
+     * @return plaintext -  starts at offset 0 of the output, and fills up the entire byte buffer.

Review Comment:
   ```suggestion
        * @return plaintext - starts at offset 0 of the output, and fills up the entire byte buffer.
   ```



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCtrDecryptor.java:
##########
@@ -86,6 +87,48 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
 
     return plainText;
   }
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {
+

Review Comment:
   Remove the blank line



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetReader.java:
##########
@@ -187,6 +189,7 @@ public static <T> Builder<T> builder(ReadSupport<T> readSupport, Path path) {
     private final InputFile file;
     private final Path path;
     private Filter filter = null;
+    private ByteBufferAllocator  allocator = new HeapByteBufferAllocator();

Review Comment:
   ```suggestion
       private ByteBufferAllocator allocator = new HeapByteBufferAllocator();
   ```



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            BytesInput decompressed;
+
+            if (options.getAllocator().isDirect() && options.useOffHeapDecryptBuffer()) {
+              ByteBuffer byteBuffer = bytes.toByteBuffer();
+              if (!byteBuffer.isDirect()) {

Review Comment:
   Should we fallback to the old path in this case?



##########
parquet-hadoop/src/test/java/org/apache/parquet/crypto/TestPropertiesDrivenEncryption.java:
##########
@@ -202,7 +211,7 @@ public static Collection<Object[]> data() {
 
   private static final boolean plaintextFilesAllowed = true;
 
-  private static final int ROW_COUNT = 10000;
+  private static final int ROW_COUNT = 100;

Review Comment:
   Why decreasing the row count?



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCtrDecryptor.java:
##########
@@ -86,6 +87,48 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
 
     return plainText;
   }
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {

Review Comment:
   There are many duplicate code from `byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLength, byte[] AAD)`. Is it possible to refactor them to share common logic?



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {

Review Comment:
   Does it apply to data page v2 as well?



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java:
##########
@@ -144,6 +144,11 @@
    */
   public static final String BLOOM_FILTERING_ENABLED = "parquet.filter.bloom.enabled";
 
+  /**
+    * Key to configure if off-heap buffer should be used for decryption
+    */
+  public static final String OFF_HEAP_DECRYPT_BUFFER_ENABLED = "parquet.decrypt.off-heap.buffer.enabled";

Review Comment:
   Please document the new config parameter to the [README](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md)



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCtrDecryptor.java:
##########
@@ -86,6 +87,48 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
 
     return plainText;
   }
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {
+
+    int cipherTextOffset = SIZE_LENGTH;
+    int cipherTextLength = ciphertext.limit() - ciphertext.position() - SIZE_LENGTH;
+
+    int plainTextLength = cipherTextLength - NONCE_LENGTH;
+    if (plainTextLength < 1) {
+      throw new ParquetCryptoRuntimeException("Wrong input length " + plainTextLength);
+    }
+
+    // skip size
+    ciphertext.position(ciphertext.position() + cipherTextOffset);
+    // Get the nonce from ciphertext
+    ciphertext.get(ctrIV, 0, NONCE_LENGTH);
+
+    // Reuse the input buffer as the output buffer
+    ByteBuffer plainText = ciphertext.slice();
+    plainText.limit(plainTextLength);
+    int inputLength = cipherTextLength - NONCE_LENGTH;
+    int inputOffset = cipherTextOffset + NONCE_LENGTH;
+    try {
+      IvParameterSpec spec = new IvParameterSpec(ctrIV);
+      cipher.init(Cipher.DECRYPT_MODE, aesKey, spec);
+
+      // Breaking decryption into multiple updates, to trigger h/w acceleration in Java 9+
+      while (inputLength > CHUNK_LENGTH) {
+        ciphertext.position(inputOffset);
+        ciphertext.limit(inputOffset + CHUNK_LENGTH);
+        cipher.update(ciphertext, plainText);
+        inputOffset += CHUNK_LENGTH;
+        inputLength -= CHUNK_LENGTH;
+      }
+      ciphertext.position(inputOffset);
+      ciphertext.limit(inputOffset + inputLength);
+      cipher.doFinal(ciphertext, plainText);
+      plainText.flip();
+    } catch (GeneralSecurityException e) {
+      throw new ParquetCryptoRuntimeException("Failed to decrypt", e);

Review Comment:
   ```suggestion
         throw new ParquetCryptoRuntimeException("Failed to decrypt ", e);
   ```



##########
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java:
##########
@@ -36,11 +37,11 @@
      * The ciphertext includes the nonce and (in case of GCM cipher) the tag, as detailed in the 
      * Parquet Modular Encryption specification.
      */
-    public byte[] encrypt(byte[] plaintext, byte[] AAD);
+    byte[] encrypt(byte[] plaintext, byte[] AAD);

Review Comment:
   Will the encryptor benefit from a similar overload of `ByteBuffer`?



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java:
##########
@@ -78,6 +79,38 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
     return plainText;
   }
 
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {

Review Comment:
   ditto



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetReader.java:
##########
@@ -242,6 +245,12 @@ public Builder<T> withFilter(Filter filter) {
       return this;
     }
 
+    public Builder<T> withAllocator(ByteBufferAllocator allocator) {

Review Comment:
   What is the purpose of setting it here? It will override the one in the `ParquetReadOptions`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] parthchandra commented on a diff in pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
parthchandra commented on code in PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#discussion_r1069182893


##########
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java:
##########
@@ -51,17 +52,26 @@
      * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
      * @return plaintext - starts at offset 0 of the output value, and fills up the entire byte array.
      */
-    public byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
+    byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
 
     /**
+     * Convenience decryption method that reads the length and ciphertext from a ByteBuffer
+     *
+     * @param from ByteBuffer with length and ciphertext.
+     * @param AAD - Additional Authenticated Data for the decryption (ignored in case of CTR cipher)
+     * @return plaintext -  starts at offset 0 of the output, and fills up the entire byte buffer.

Review Comment:
   Done



##########
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java:
##########
@@ -36,11 +37,11 @@
      * The ciphertext includes the nonce and (in case of GCM cipher) the tag, as detailed in the 
      * Parquet Modular Encryption specification.
      */
-    public byte[] encrypt(byte[] plaintext, byte[] AAD);
+    byte[] encrypt(byte[] plaintext, byte[] AAD);

Review Comment:
   Probably. This pr only focusses on the read path though. I can add the encryptor api when I look at the write path. 



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCtrDecryptor.java:
##########
@@ -86,6 +87,48 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
 
     return plainText;
   }
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {
+

Review Comment:
   Done



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputFormat.java:
##########
@@ -144,6 +144,11 @@
    */
   public static final String BLOOM_FILTERING_ENABLED = "parquet.filter.bloom.enabled";
 
+  /**
+    * Key to configure if off-heap buffer should be used for decryption
+    */
+  public static final String OFF_HEAP_DECRYPT_BUFFER_ENABLED = "parquet.decrypt.off-heap.buffer.enabled";

Review Comment:
   Done



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {

Review Comment:
   Actually, it does. Added it for V2 and updated the unit tests as well. 



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetReader.java:
##########
@@ -242,6 +245,12 @@ public Builder<T> withFilter(Filter filter) {
       return this;
     }
 
+    public Builder<T> withAllocator(ByteBufferAllocator allocator) {

Review Comment:
   Well, yes. ParquetReader has an instance of `ParquetReadOptions` and this builder simply passes the allocator down to that instance. Adding this here is mirroring the other options being set in `ParquetReader.Builder`. Used in the unit tests.



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageReadStore.java:
##########
@@ -133,11 +135,36 @@ public DataPage readPage() {
         public DataPage visit(DataPageV1 dataPageV1) {
           try {
             BytesInput bytes = dataPageV1.getBytes();
-            if (null != blockDecryptor) {
-              bytes = BytesInput.from(blockDecryptor.decrypt(bytes.toByteArray(), dataPageAAD));
+            BytesInput decompressed;
+
+            if (options.getAllocator().isDirect() && options.useOffHeapDecryptBuffer()) {
+              ByteBuffer byteBuffer = bytes.toByteBuffer();
+              if (!byteBuffer.isDirect()) {

Review Comment:
   I think there was a previous discussion on this and we simplified the code to throw an error. Basically, we expect that a user the ParquetReader will be using either a direct buffer or a heap buffer all the way thru, so that encountering both a direct buffer and a heap buffer is an error.



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetReader.java:
##########
@@ -187,6 +189,7 @@ public static <T> Builder<T> builder(ReadSupport<T> readSupport, Path path) {
     private final InputFile file;
     private final Path path;
     private Filter filter = null;
+    private ByteBufferAllocator  allocator = new HeapByteBufferAllocator();

Review Comment:
   Done



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java:
##########
@@ -78,6 +79,38 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
     return plainText;
   }
 
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {

Review Comment:
   Done



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCtrDecryptor.java:
##########
@@ -86,6 +87,48 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
 
     return plainText;
   }
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {
+
+    int cipherTextOffset = SIZE_LENGTH;
+    int cipherTextLength = ciphertext.limit() - ciphertext.position() - SIZE_LENGTH;
+
+    int plainTextLength = cipherTextLength - NONCE_LENGTH;
+    if (plainTextLength < 1) {
+      throw new ParquetCryptoRuntimeException("Wrong input length " + plainTextLength);
+    }
+
+    // skip size
+    ciphertext.position(ciphertext.position() + cipherTextOffset);
+    // Get the nonce from ciphertext
+    ciphertext.get(ctrIV, 0, NONCE_LENGTH);
+
+    // Reuse the input buffer as the output buffer
+    ByteBuffer plainText = ciphertext.slice();
+    plainText.limit(plainTextLength);
+    int inputLength = cipherTextLength - NONCE_LENGTH;
+    int inputOffset = cipherTextOffset + NONCE_LENGTH;
+    try {
+      IvParameterSpec spec = new IvParameterSpec(ctrIV);
+      cipher.init(Cipher.DECRYPT_MODE, aesKey, spec);
+
+      // Breaking decryption into multiple updates, to trigger h/w acceleration in Java 9+
+      while (inputLength > CHUNK_LENGTH) {
+        ciphertext.position(inputOffset);
+        ciphertext.limit(inputOffset + CHUNK_LENGTH);
+        cipher.update(ciphertext, plainText);
+        inputOffset += CHUNK_LENGTH;
+        inputLength -= CHUNK_LENGTH;
+      }
+      ciphertext.position(inputOffset);
+      ciphertext.limit(inputOffset + inputLength);
+      cipher.doFinal(ciphertext, plainText);
+      plainText.flip();
+    } catch (GeneralSecurityException e) {
+      throw new ParquetCryptoRuntimeException("Failed to decrypt", e);

Review Comment:
   The original implementation in   `decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLength, byte[] AAD)` has it the same way (without the trailing space). Leaving it as it is to keep it consistent. 
   If you want, we can change both. I would rather we terminate the message with a `.` (period).
   



##########
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCtrDecryptor.java:
##########
@@ -86,6 +87,48 @@ public byte[] decrypt(byte[] ciphertext, int cipherTextOffset, int cipherTextLen
 
     return plainText;
   }
+  public ByteBuffer decrypt(ByteBuffer ciphertext, byte[] AAD) {

Review Comment:
   I've tried to, but the changes are rather intertwined and we'll end up with multiple functions abstracting out the common lines of code, with none of the functions seeming to be a complete unit by itself. I feel the current version is easier to read and maintain.



##########
parquet-hadoop/src/test/java/org/apache/parquet/crypto/TestPropertiesDrivenEncryption.java:
##########
@@ -202,7 +211,7 @@ public static Collection<Object[]> data() {
 
   private static final boolean plaintextFilesAllowed = true;
 
-  private static final int ROW_COUNT = 10000;
+  private static final int ROW_COUNT = 100;

Review Comment:
   TBH, I forget why. But on looking at it again, and rerunning the tests, I've _increased_ the value to test a part of the `AesCtrDecryptor` that was not getting tested before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [parquet-mr] shangxinli commented on pull request #1008: PARQUET-2212: Add ByteBuffer api for decryptors to allow direct memory to be decrypted

Posted by GitBox <gi...@apache.org>.
shangxinli commented on PR #1008:
URL: https://github.com/apache/parquet-mr/pull/1008#issuecomment-1376755881

   @wgtmac Do you have time to have a look? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org