You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Mehdi Ennaime (Jira)" <ji...@apache.org> on 2023/06/19 12:37:00 UTC

[jira] [Created] (COMPRESS-646) Improve performance of the Snappy Framed I/O streams

Mehdi Ennaime created COMPRESS-646:
--------------------------------------

             Summary: Improve performance of the Snappy Framed I/O streams
                 Key: COMPRESS-646
                 URL: https://issues.apache.org/jira/browse/COMPRESS-646
             Project: Commons Compress
          Issue Type: Wish
          Components: Compressors
    Affects Versions: 1.22
         Environment: java 11.0.2 (openjdk )
tested on both Windows 10 and linux (Ubuntu 20.04)
            Reporter: Mehdi Ennaime
         Attachments: Tools.java

Hello,

I've been using the snappy format as a way to quickly compress/decompress json files, and have been using the
{{FramedSnappyCompressorOutputStream }}and
{{FramedSnappyCompressorInputStream }}provided by Apache Compress to do so since I already had several dependencies to apache.compress module.

Although the compression/decompression works fine for every file, feedback regarding performance issues for large files started to emerge.

The performance of these streams was very underwhelming upon testing.

The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was taking 2minutes, which is far from the expected perfomances of a snappy stream which  "[...] does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.".

Switching to xerial/snappy-java 's Framed IO Streams reduced the compression/decompression times by two orders of magnitude.

Running the same code in the provided [^Tools.java] through a maven command took 1.5sec by replacing the Stream implementation to {{org.xerial.snappy.SnappyFramedInputStream }}, versus a consistent 125+secs with {{FramedSnappyCompressorInputStream .
}}
Since it's not a bug, i'm not flagging this ticket as such but it makes the usage of the apache compress library pointless for that format, and even counter-productive.

Having performances up to par with other implementations, or the decompressor to be deprecated would be greatly appreciated.

I've tried to upload the aforementionned file, but Jira refuses to take as the direct upload limit is 60mb. I should however be able to provide a 40-ish mb file if necessary.

Best Regards,

Mehdi Ennaïme



--
This message was sent by Atlassian Jira
(v8.20.10#820010)