You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary Lucas (JIRA)" <ji...@apache.org> on 2012/04/30 14:54:49 UTC

[jira] [Created] (SANSELAN-78) Improve speed of random-access-file handling for TIFF format, potentially others

Gary Lucas created SANSELAN-78:
----------------------------------

             Summary: Improve speed of random-access-file handling for TIFF format, potentially others
                 Key: SANSELAN-78
                 URL: https://issues.apache.org/jira/browse/SANSELAN-78
             Project: Commons Sanselan
          Issue Type: Improvement
          Components: Format: TIFF
            Reporter: Gary Lucas



Large TIFF files can be organized into chunks (either strips or tiles) so that the image can be read a piece-at-a-time.  In the Apache Imaging implementation, each time one of these pieces is read, the TiffReader uses the getBlock() method of the ByteSourceFile class.  This class opens the file using the Java RandomAccessFile class, seeks to the position of the data in the file, reads its content, and closes the file.   Although this operation can be performed several times and thus entails a lot of redundant file opens and reads, the file cache performance on modern computers is truly amazing and for files of less than 5 megabytes, it often doesn't make a difference.   On larger files, however, it can be significant.

This Tracker Item proposes to modify the ByteSourceFile class so that an access routine can optionally hold the file open between getBlock() method calls.   It will accomplish this by adding a new method called .setPersistent(boolean).  By default, persistence will be set to false and the ByteSourceFile class will continue to work just as it always has (existing code will not be affected).  If persistence is set to true, the RandomAccessFile will be held open.

To get some sense of the performance difference, I ran several tests.  For the sample  "ron and andy.tif" file provided with the Apache Imaging package, which is under 5 megabytes, the change made little difference.   However, when I tested with a larger files, such as the Apache Imaging sample 2560-by-1920 pixel  PICT2833.TIF file (a blurry picture of a pretty girl), and a 2500-by-2500 pixel file I downloaded from the US Geological Survey (USGS), I saw notable differences.  

I also tested on a fast local disk (my PC) and on a network disk.  Not surprisingly, the network disk showed the biggest change (in order to keep the test environment clean, I ran the network test early in the morning when the network was lightly used).

As you can see in the tests below on the local disk the savings is modest even for the largest file.  However, when dealing with a network file system, the change becomes significant.

{code}
ron and andy.tif   1500-by-1125   4.8 MB       
    local  original:     25.9 ms.   
    local  modified:     24.8 ms.
    network original:   122.7 ms.
    network modified:   117.6 ms.

PICT2833.TIF   2560-by-1920  14.1 MB
    local  original:     77.7 ms.   
    local  modified:     61.7 ms.
    network original:   774.1 ms.
    network modified:   463.8 ms.

USGS1   2500-by-2500   18.8 MB
    local  original:    192.3 ms.   
    local  modified:     94.5 ms.
    network original:  3992.8 ms.
    network modified:  1807.1 ms.

USGS2  10000-by-10000  286 MB
    local  original:   1930.5 ms.   
    local  modified:   1344.5 ms.
    network original: 26627.6 ms.
    network modified: 13402.1 ms.

{code}
One consequence of this change is that if persistence is set to true, the file will be held open until the ByteSourceFile goes out-of-scope and is garbage collected.  So this change will also make sure that the TiffReader sets the persistence back to false when it is done reading the file in order to expedite the release of file resources.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (SANSELAN-78) Improve speed of random-access-file handling for TIFF format, potentially others

Posted by "Damjan Jovanovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SANSELAN-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267182#comment-13267182 ] 

Damjan Jovanovic commented on SANSELAN-78:
------------------------------------------

Well what we really want here is an interface that will allow seeking as well as I/O on any backend representation (byte[], InputStream or File). Such an interface doesn't exist in Java - RandomAccessFile and FileChannel both require local files, while InputStream doesn't allow seeking.

Ideally we'd have a SeekableInputStream and some way to get it from a ByteSource and then keep reusing it.

                
> Improve speed of random-access-file handling for TIFF format, potentially others
> --------------------------------------------------------------------------------
>
>                 Key: SANSELAN-78
>                 URL: https://issues.apache.org/jira/browse/SANSELAN-78
>             Project: Commons Sanselan
>          Issue Type: Improvement
>          Components: Format: TIFF
>            Reporter: Gary Lucas
>
> Large TIFF files can be organized into chunks (either strips or tiles) so that the image can be read a piece-at-a-time.  In the Apache Imaging implementation, each time one of these pieces is read, the TiffReader uses the getBlock() method of the ByteSourceFile class.  This class opens the file using the Java RandomAccessFile class, seeks to the position of the data in the file, reads its content, and closes the file.   Although this operation can be performed several times and thus entails a lot of redundant file opens and reads, the file cache performance on modern computers is truly amazing and for files of less than 5 megabytes, it often doesn't make a difference.   On larger files, however, it can be significant.
> This Tracker Item proposes to modify the ByteSourceFile class so that an access routine can optionally hold the file open between getBlock() method calls.   It will accomplish this by adding a new method called .setPersistent(boolean).  By default, persistence will be set to false and the ByteSourceFile class will continue to work just as it always has (existing code will not be affected).  If persistence is set to true, the RandomAccessFile will be held open.
> To get some sense of the performance difference, I ran several tests.  For the sample  "ron and andy.tif" file provided with the Apache Imaging package, which is under 5 megabytes, the change made little difference.   However, when I tested with a larger files, such as the Apache Imaging sample 2560-by-1920 pixel  PICT2833.TIF file (a blurry picture of a pretty girl), and a 2500-by-2500 pixel file I downloaded from the US Geological Survey (USGS), I saw notable differences.  
> I also tested on a fast local disk (my PC) and on a network disk.  Not surprisingly, the network disk showed the biggest change (in order to keep the test environment clean, I ran the network test early in the morning when the network was lightly used).
> As you can see in the tests below on the local disk the savings is modest even for the largest file.  However, when dealing with a network file system, the change becomes significant.
> {code}
> ron and andy.tif   1500-by-1125   4.8 MB       
>     local  original:     25.9 ms.   
>     local  modified:     24.8 ms.
>     network original:   122.7 ms.
>     network modified:   117.6 ms.
> PICT2833.TIF   2560-by-1920  14.1 MB
>     local  original:     77.7 ms.   
>     local  modified:     61.7 ms.
>     network original:   774.1 ms.
>     network modified:   463.8 ms.
> USGS1   2500-by-2500   18.8 MB
>     local  original:    192.3 ms.   
>     local  modified:     94.5 ms.
>     network original:  3992.8 ms.
>     network modified:  1807.1 ms.
> USGS2  10000-by-10000  286 MB
>     local  original:   1930.5 ms.   
>     local  modified:   1344.5 ms.
>     network original: 26627.6 ms.
>     network modified: 13402.1 ms.
> {code}
> One consequence of this change is that if persistence is set to true, the file will be held open until the ByteSourceFile goes out-of-scope and is garbage collected.  So this change will also make sure that the TiffReader sets the persistence back to false when it is done reading the file in order to expedite the release of file resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (IMAGING-63) Improve speed of random-access-file handling for TIFF format, potentially others

Posted by "Damjan Jovanovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/IMAGING-63?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Damjan Jovanovic resolved IMAGING-63.
-------------------------------------

    Resolution: Later

Deferring to after the 1.0 release.
                
> Improve speed of random-access-file handling for TIFF format, potentially others
> --------------------------------------------------------------------------------
>
>                 Key: IMAGING-63
>                 URL: https://issues.apache.org/jira/browse/IMAGING-63
>             Project: Commons Imaging
>          Issue Type: Improvement
>          Components: Format: TIFF
>            Reporter: Gary Lucas
>
> Large TIFF files can be organized into chunks (either strips or tiles) so that the image can be read a piece-at-a-time.  In the Apache Imaging implementation, each time one of these pieces is read, the TiffReader uses the getBlock() method of the ByteSourceFile class.  This class opens the file using the Java RandomAccessFile class, seeks to the position of the data in the file, reads its content, and closes the file.   Although this operation can be performed several times and thus entails a lot of redundant file opens and reads, the file cache performance on modern computers is truly amazing and for files of less than 5 megabytes, it often doesn't make a difference.   On larger files, however, it can be significant.
> This Tracker Item proposes to modify the ByteSourceFile class so that an access routine can optionally hold the file open between getBlock() method calls.   It will accomplish this by adding a new method called .setPersistent(boolean).  By default, persistence will be set to false and the ByteSourceFile class will continue to work just as it always has (existing code will not be affected).  If persistence is set to true, the RandomAccessFile will be held open.
> To get some sense of the performance difference, I ran several tests.  For the sample  "ron and andy.tif" file provided with the Apache Imaging package, which is under 5 megabytes, the change made little difference.   However, when I tested with a larger files, such as the Apache Imaging sample 2560-by-1920 pixel  PICT2833.TIF file (a blurry picture of a pretty girl), and a 2500-by-2500 pixel file I downloaded from the US Geological Survey (USGS), I saw notable differences.  
> I also tested on a fast local disk (my PC) and on a network disk.  Not surprisingly, the network disk showed the biggest change (in order to keep the test environment clean, I ran the network test early in the morning when the network was lightly used).
> As you can see in the tests below on the local disk the savings is modest even for the largest file.  However, when dealing with a network file system, the change becomes significant.
> {code}
> ron and andy.tif   1500-by-1125   4.8 MB       
>     local  original:     25.9 ms.   
>     local  modified:     24.8 ms.
>     network original:   122.7 ms.
>     network modified:   117.6 ms.
> PICT2833.TIF   2560-by-1920  14.1 MB
>     local  original:     77.7 ms.   
>     local  modified:     61.7 ms.
>     network original:   774.1 ms.
>     network modified:   463.8 ms.
> USGS1   2500-by-2500   18.8 MB
>     local  original:    192.3 ms.   
>     local  modified:     94.5 ms.
>     network original:  3992.8 ms.
>     network modified:  1807.1 ms.
> USGS2  10000-by-10000  286 MB
>     local  original:   1930.5 ms.   
>     local  modified:   1344.5 ms.
>     network original: 26627.6 ms.
>     network modified: 13402.1 ms.
> {code}
> One consequence of this change is that if persistence is set to true, the file will be held open until the ByteSourceFile goes out-of-scope and is garbage collected.  So this change will also make sure that the TiffReader sets the persistence back to false when it is done reading the file in order to expedite the release of file resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira