You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by GitBox <gi...@apache.org> on 2022/09/30 09:22:05 UTC

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #959: PARQUET-2126: Make cached (de)compressors thread-safe

ggershinsky commented on code in PR #959:
URL: https://github.com/apache/parquet-mr/pull/959#discussion_r984394753


##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -244,16 +272,60 @@ protected CompressionCodec getCodec(CompressionCodecName codecName) {
     }
   }
 
+  /**
+   * Modified for https://issues.apache.org/jira/browse/PARQUET-2126
+   * This releases all cached instances of all compressors and decompressors created by all threads that share
+   * this CodeFactory instance.
+   * Note: A problem might occur if release() were called while some codec instances were still in use, but it
+   * would not make sense to call close() or release() on a shared CodecFactory while some threads are still
+   * actively using it. The usage pattern should be:

Review Comment:
   It would be also good to have a look at an example of such callers, eg in Apache Spark, to start analyzing the implications.



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/CodecFactory.java:
##########
@@ -244,16 +272,60 @@ protected CompressionCodec getCodec(CompressionCodecName codecName) {
     }
   }
 
+  /**
+   * Modified for https://issues.apache.org/jira/browse/PARQUET-2126
+   * This releases all cached instances of all compressors and decompressors created by all threads that share
+   * this CodeFactory instance.
+   * Note: A problem might occur if release() were called while some codec instances were still in use, but it
+   * would not make sense to call close() or release() on a shared CodecFactory while some threads are still
+   * actively using it. The usage pattern should be:

Review Comment:
   regarding the code parts that call this release() method. For those parts inside the parquet-mr codebase, can  some of them implement/enforce this pattern fully internally? I'd guess most/all of them eventually pass this responsibility to the "app" code above parquet-mr API; so probably this pattern documentation should be moved/copied/referenced in these APIs. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org