You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gary Lucas (JIRA)" <ji...@apache.org> on 2012/04/29 02:40:48 UTC

[jira] [Issue Comment Edited] (SANSELAN-76) Reduce memory use of TIFF readers

    [ https://issues.apache.org/jira/browse/SANSELAN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264443#comment-13264443 ] 

Gary Lucas edited comment on SANSELAN-76 at 4/29/12 12:40 AM:
--------------------------------------------------------------

I've basically completed the first of the two changes proposed above.  This change reduces the amount of memory needed to load a TIFF image.   I think it would probably be best if I treated the second proposed change as a different tracker item (so as to keep the scope of this change small and managable).

The following shows a comparison of the Before and After versions.  Both the amount of memory used for live objects and for the JVM as a whole are significantly reduced.

 
This is tested using a 10000-by-10000 TIFF file from the U.S. Geological Survey.  The source file uses the 24-bit RGB format and is 286.2 MB.  The output image is 381 MB. in size.  Memory stats are extracted from the Java Runtime class and collected before the TiffParser goes out-of-scope.


Before change
time to load image     --        memory
 time ms      avg ms   --     used mb   total mb
 2391.575     0.000    --    670.951  1015.375 
 1797.042     0.000    --    675.350  1160.000 
 1703.935  1703.935    --    670.298  1045.633 
 1924.843  1814.389    --    671.955  1015.188 
 1708.914  1779.231    --    672.305  1160.000 
 1687.799  1756.373    --    670.298  1045.633 
 1927.832  1790.665    --    672.176  1015.188 
 1794.254  1791.263    --    670.789  1160.000 
 1698.290  1777.981    --    670.298  1045.633 
 1928.838  1796.838    --    672.220  1015.188


After
 time to load image    --         memory
 time ms      avg ms   --    used mb   total mb
 2128.425     0.000    --    382.990   397.035 
 1823.000     0.000    --    528.913   568.723 
 1845.152  1845.152    --    413.471   695.723 
 1904.049  1874.601    --    383.010   397.039 
 1904.234  1884.478    --    383.210   397.039 
 1907.394  1890.207    --    383.210   397.039 
 1905.385  1893.243    --    383.197   397.039 
 1907.052  1895.544    --    383.197   397.039 
 1902.848  1896.588    --    383.197   397.039 
 1898.601  1896.840    --    383.197   397.039

                
      was (Author: gwlucas):
    
I've basically completed the first of the two changes proposed above.  This change reduces the amount of memory needed to load a TIFF image.   I think it would probably be best if I treated the second proposed change as a different tracker item (so as to keep the scope of this change small and managable).

The following shows a comparison of the Before and After versions.  Both the amount of memory used for live objects and for the JVM as a whole are significantly reduced.

{{monospaced}}
This is tested using a 10000-by-10000 TIFF file from the U.S. Geological Survey.  The source file uses the 24-bit RGB format and is 286.2 MB.  The output image is 381 MB. in size.  Memory stats are extracted from the Java Runtime class and collected before the TiffParser goes out-of-scope.


Before change
time to load image               memory
 time ms      avg ms         used mb   total mb
 2391.575     0.000    --    670.951  1015.375 
 1797.042     0.000    --    675.350  1160.000 
 1703.935  1703.935    --    670.298  1045.633 
 1924.843  1814.389    --    671.955  1015.188 
 1708.914  1779.231    --    672.305  1160.000 
 1687.799  1756.373    --    670.298  1045.633 
 1927.832  1790.665    --    672.176  1015.188 
 1794.254  1791.263    --    670.789  1160.000 
 1698.290  1777.981    --    670.298  1045.633 
 1928.838  1796.838    --    672.220  1015.188


After
 time to load image               memory
 time ms      avg ms         used mb   total mb
 2128.425     0.000    --    382.990   397.035 
 1823.000     0.000    --    528.913   568.723 
 1845.152  1845.152    --    413.471   695.723 
 1904.049  1874.601    --    383.010   397.039 
 1904.234  1884.478    --    383.210   397.039 
 1907.394  1890.207    --    383.210   397.039 
 1905.385  1893.243    --    383.197   397.039 
 1907.052  1895.544    --    383.197   397.039 
 1902.848  1896.588    --    383.197   397.039 
 1898.601  1896.840    --    383.197   397.039

                  
> Reduce memory use of TIFF readers
> ---------------------------------
>
>                 Key: SANSELAN-76
>                 URL: https://issues.apache.org/jira/browse/SANSELAN-76
>             Project: Commons Sanselan
>          Issue Type: Improvement
>          Components: Format: TIFF
>            Reporter: Gary Lucas
>   Original Estimate: 80h
>  Remaining Estimate: 80h
>
> This Tracker Item proposes changes to the TIFF file readers to address memory issues when reading very large images from TIFF files.  The TIFF format is used extensively in technical applications such as aerial photographs, satellite images, and digital raster maps which feature very large image sizes.  For example, the public-domain Natural Earth Data set features raster files sized 21,600 by 10,800 pixels (222.5 megapixels).   Although this example is unusually large, image sizes of 25 to 100 megapixels are common for such applications.
> Unfortunately, when Sanselan reads a TIFF image, it consumes nearly twice as much memory as is necessary.  The reader operates in two stages. First, it reads the entire source file into memory then it builds the output image, also in memory.   In the example file mentioned above, the source data runs from 83.19 to 373 megabytes (depending on compression).   Thus Sanselan would require a minimum of 83.19+4*222.5 = 985 megabytes to produce an image for one of these files (allowing 4 bytes per pixel in the output BufferedImage)
> Fortunately, TIFF files are organized so that they can be read a piece at a time.  TIFF files are divided into either strips or tiles and, if data compression is used, each piece is compressed individually.  Thus each individual piece has no dependency on the other. 
> This item proposes to implement two changes:
> 1)  Allow the TIFF data reader to read the files one piece at a time while constructing the buffered image.  Thus the memory use for reading would be no larger than the piece size.  This would be an internal change, so the external appearance of the Sanselan getBufferedImage methods would not change.
> 2) Provide new API elements that permit applications to read the strips or tiles from TIFF files individually.     This change would support applications that needed to access very large TIFF files without committing the memory to store a BufferedImage for the entire file (a 222.5 megapixel image requires 890 megabytes, which is a lot even by contemporary standards).
> There is one minor issue in this implementation that is easily addressed.  Sanselan reads images from ByteSources that can be either random-access files or sequential-access input streams.  In the case of sequential-input streams, it may be hard to perform a partial read on a TIFF directory.  In such a case, the TIFF access routines might have to resort to reading the entire source data into memory as it currently does.   This would simply be a limitation of the implementation.
> There is one issue that may make this change a bit problematic.  The TIFF processors depend on accessing a class called TiffDataElement that contains a public array of bytes called "data".   The most expeditious way of implementing the enchancement is to make this element private and add an accessor that either returns the data from internal memory or else loads it on-demand.  Unfortunately, because the data element is scoped to public, there is a chance that some existing applications are using it directly.   In hindsight, it is clear that scoping this element as public was a mistake, but it may be too late to fix it.  So care will be required to ensure that compatibility remains.   The most likely solution seems to be to implement a new class for passing raw data from the source TIFF files to the DataReader implementations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira