You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gilles Sadowski (Jira)" <ji...@apache.org> on 2023/06/19 12:59:00 UTC

[jira] [Commented] (COMPRESS-646) Improve performance of the Snappy Framed I/O streams

    [ https://issues.apache.org/jira/browse/COMPRESS-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17734172#comment-17734172 ] 

Gilles Sadowski commented on COMPRESS-646:
------------------------------------------

Since speed seems to be a necessary feature for that format, we should probably set up a JMH benchmark.
See how it has been done for e.g. ["Commons RNG"|https://github.com/apache/commons-rng/tree/master/commons-rng-examples/examples-jmh].  You are welcome to do something similar for "Commons Compress".

Would you investigate to find out what causes the slowness?

Then, how to fix?
* Copy code from the mentioned implementation (?)
* Depend on it (?)

This should be discussed/decided on the "dev" ML.


> Improve performance of the Snappy Framed I/O streams
> ----------------------------------------------------
>
>                 Key: COMPRESS-646
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-646
>             Project: Commons Compress
>          Issue Type: Wish
>          Components: Compressors
>    Affects Versions: 1.22
>         Environment: java 11.0.2 (openjdk )
> tested on both Windows 10 and linux (Ubuntu 20.04)
>            Reporter: Mehdi Ennaime
>            Priority: Minor
>         Attachments: Tools.java
>
>
> Hello,
> I've been using the snappy format as a way to quickly compress/decompress json files, and have been using the
> {\{FramedSnappyCompressorOutputStream}} and
> {\{FramedSnappyCompressorInputStream}} provided by Apache Compress to do so since I already had several dependencies to apache.compress module.
> Although the compression/decompression works fine for every file, feedback regarding performance issues for large files started to emerge.
> The performance of these streams was very underwhelming upon testing.
> The decompression of a 90MB json.sz file (1.5 Gb decompressed .json ) was taking 2minutes, which is far from the expected perfomances of a snappy stream which  "[...] does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression.".
> Switching to xerial/snappy-java 's Framed IO Streams reduced the compression/decompression times by two orders of magnitude.
> Running the same code in the provided [^Tools.java] through a maven command took 1.5sec by replacing the Stream implementation to \{{org.xerial.snappy.SnappyFramedInputStream}} , versus a consistent 125+secs with \{{FramedSnappyCompressorInputStream}}.
> Since it's not a bug, i'm not flagging this ticket as such but it makes the usage of the apache compress library pointless for that format, and even counter-productive.
> Having performances up to par with other implementations, or the decompressor to be deprecated would be greatly appreciated.
> I've tried to upload the aforementionned file, but Jira refuses to take as the direct upload limit is 60mb. I should however be able to provide a 40-ish mb file if necessary.
> Best Regards,
> Mehdi Ennaïme



--
This message was sent by Atlassian Jira
(v8.20.10#820010)