You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/27 19:16:23 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2943: Added support for LZ4_RAW compression. (#1604)

tustvold commented on code in PR #2943:
URL: https://github.com/apache/arrow-rs/pull/2943#discussion_r1007271432


##########
parquet/src/compression.rs:
##########
@@ -325,6 +327,65 @@ mod zstd_codec {
 #[cfg(any(feature = "zstd", test))]
 pub use zstd_codec::*;
 
+#[cfg(any(feature = "lz4", test))]
+mod lz4_raw_codec {
+    use std::io::{Read, Write};
+
+    use crate::compression::Codec;
+    use crate::errors::Result;
+
+    /// Codec for LZ4 Raw compression algorithm.
+    pub struct LZ4RawCodec {}
+
+    impl LZ4RawCodec {
+        /// Creates new LZ4 Raw compression codec.
+        pub(crate) fn new() -> Self {
+            Self {}
+        }
+    }
+
+    // Compute max LZ4 uncompress size.
+    // Check https://stackoverflow.com/questions/25740471/lz4-library-decompressed-data-upper-bound-size-estimation
+    fn max_uncompressed_size(compressed_size: usize) -> usize {
+        (compressed_size << 8) - compressed_size - 2526
+    }
+
+    impl Codec for LZ4RawCodec {
+        fn decompress(
+            &mut self,
+            input_buf: &[u8],
+            output_buf: &mut Vec<u8>,
+        ) -> Result<usize> {
+            let offset = output_buf.len();
+            let required_len = max_uncompressed_size(input_buf.len());

Review Comment:
   Longer term it would be nice to plumb the decompressed size down, as we do actually know what it is from the page header



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org