You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary Lucas (Jira)" <ji...@apache.org> on 2023/07/01 09:14:00 UTC

[jira] [Commented] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT

    [ https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739288#comment-17739288 ] 

Gary Lucas commented on IMAGING-356:
------------------------------------

I haven't studied the changes that were made, so I can't offer any authoritative recommendations on the approach.  Instead, I have a few general observations about the way TIFF files work that may be useful in figuring how you tackle the problem.  Or perhaps not. So take them with a grain of salt.

TIFF files are kind of a special case in terms of image formats. First off, one can never assume that a TIFF file is going to be accessed in-order.  It is common for the the "directory" section of the file (which tells how its organized) to come last rather than first. And, of course, a TIFF file may have multiple directories (because it may contain multiple images).     Second, TIFF files are typically quite large, often in the hundreds of megabytes range, and sometimes in the gigabyte range.  So it is often preferred to not keep the entire thing in memory. In many cases, an application will not  access the entire file, but only a subsection.  For example, a mapping program displaying an aerial photograph might only access the subsection of the photograph that is actually visible on the map. And finally, I note that TIFF files are often not images at all, but are used to store numerical raster data (such as Earth elevation and ocean depth data). 

All of this means that the file-access pattern for a TIFF file is a closer fit to the idea of a random access file rather than the idea of a sequential IO channel such as a network socket or a serial device.  I know that the PNG format (the only other one I've studied in depth)  was designed with network access specifically in mind.  The TIFF format evolved before network access was in the ascendency as it is today.

That being said, even the original Commons Imaging approach to TIFF file IO wasn't quite a perfect fit. For one thing, the original authors open and close a file multiple times (as they access each part of the file) . That is suboptimal since opening and closing a file carries its own performance overhead.  Also, when I was looking at refactoring Commons Imaging IO to implement Closeable to support of try-with-resources blocks, I didn't see a way to accomplish that without a significant rewrite and compatibility breaking changes to the public API.  



> TIFF reading extremely slow in version 1.0-SNAPSHOT
> ---------------------------------------------------
>
>                 Key: IMAGING-356
>                 URL: https://issues.apache.org/jira/browse/IMAGING-356
>             Project: Commons Imaging
>          Issue Type: Bug
>          Components: Format: TIFF
>    Affects Versions: 1.0
>            Reporter: Gary Lucas
>            Priority: Major
>
> I am using the latest code from github (1.0-SNAPSHOT downloaded from github of June 2023) to read a 300 megabyte TIFF file.  Version 1.0-alpha3 required 673 milliseconds to read that file.  The new code requires upward of 15 minutes.   Clearly something got broken since the last release.
> The TIFF file is a 10000x10000 pixel 4 byte image format organized in strips.  The bottleneck appears to occur in the TiffReader getTiffRawImageData method which reads raw data from the file in preparation of creating a BufferedImage object.
> I suspect that there may be a general slowness of file access.  In debugging, even reading the initial metadata (22 TIFF tags) took a couple of seconds.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)