You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/29 19:07:50 UTC

[GitHub] [beam] grufino-blackbird commented on a diff in pull request #22419: Add zstd compression/decompression support

grufino-blackbird commented on code in PR #22419:
URL: https://github.com/apache/beam/pull/22419#discussion_r933550128


##########
sdks/python/apache_beam/io/filesystem.py:
##########
@@ -166,6 +177,9 @@ def _initialize_decompressor(self):
       self._decompressor = bz2.BZ2Decompressor()
     elif self._compression_type == CompressionTypes.DEFLATE:
       self._decompressor = zlib.decompressobj()
+    elif self._compression_type == CompressionTypes.ZSTD:
+      self._decompressor = zstandard.ZstdDecompressor(
+          max_window_size=2147483648).decompressobj()

Review Comment:
   thank you for the suggestion, I agree and found this: https://github.com/indygreg/python-zstandard/issues/157
   apparently it is related to the compression level used. What do you think about adding this as a comment? It doesn't seem like the library intends to fix this, as the issue was closed, but I still think it's better to leave the value as it seems to work as a general recommendation (for my use case I tested in 10GB+ compressed or 100GB+ decompressed files and it is working fine)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org