You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Michael Osipov (Jira)" <ji...@apache.org> on 2020/04/25 14:47:00 UTC

[jira] [Commented] (IMAGING-257) Investigate speed improvements to LZW decompression

    [ https://issues.apache.org/jira/browse/IMAGING-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092260#comment-17092260 ] 

Michael Osipov commented on IMAGING-257:
----------------------------------------

Why don't you provide a PR for this?

> Investigate speed improvements to LZW decompression
> ---------------------------------------------------
>
>                 Key: IMAGING-257
>                 URL: https://issues.apache.org/jira/browse/IMAGING-257
>             Project: Commons Imaging
>          Issue Type: Improvement
>          Components: Format: TIFF
>            Reporter: Gary Lucas
>            Priority: Minor
>
> In accessing large TIFF files (10812-by-10812 pixels), read times were about 11 seconds (with a solid-state disk drive), and I was looking for ways to reduce that.  I ran the Netbeans profiler and discovered that 87% of the read time was spent in the MyLzwDecompressor decompress() method. 
> Inspecting MyLzwDecompressor, I saw that it used the Java ByteArrayOutputStream, which is kind of famous for being slow.  You can find lots of examples of classes named FastByteArrayOutputStream on the web, including one right here in the Commons Imaging project.  
> I tried a number of different experiments using the ApacheImagingSpeedAndMemoryTest class (from the examples directory).
> Replacing ByteArrayOutputStream with FastByteArrayOutputStream produced a 4 percent reduction in run time.
> I then tried using a local array instead of a "byte array" class.  That improved things to about a 8 percent reduction in time.  Finally, I tried a few more aggressive changes, removing the number of conditional tests and replacing calls such as stringFromCode() which wrappers the class member "table" with direct access.  Final result was a 11 percent total reduction in time.
> 11 percent isn't all the impressive, but I haven't been able to find anything else.  Modern compilers are so smart and do such a good job optimizing code, that it's hard to find "easy wins."  
> Anyway, this is a potential area for improvement in the Commons Imaging API.  Care will be required because there are some features that my test bypassed. For example, there's a diagnostic "listener" in the current implementation that would have to be supported.  Also, I took out a lot of bounds checking, and just assumed that the input compressed data would produce correct output.  In real life, that's not a safe assumption.  I would probably try wrapping the logic of the decompress method in a try{}catch{} block looking for ArrayIndexOutOfBounds and have the method re-throw it as an IOException (which is what it does now).  It will also be challenging to find a way of properly testing modifications to this class.
> I will be looking at this, but probably will not move on it until I get feedback from the community.   I don't view this change as unduly risky provided that proper care is taken in making the modifications.  But the gain in performance is small enough, that I'm not sure it's worth it.
> I will also take a look at Commons Compression to see what they do.
> If you have any thoughts on this matter, please let me know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)