You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Pavel Micka (JIRA)" <ji...@apache.org> on 2015/05/20 09:49:00 UTC

[jira] [Updated] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

     [ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Micka updated TIKA-1631:
------------------------------
    Description: 
When I try to detect ZIP container I rarely get this exception. It is caused by the fact that the file looks like ZIP container (magics), but in fact its random noise. So Apache decompress tries to find the size of tables (expects correct stream), loads coincidentally huge number (as on the given place there can be anything in the stream) and tries to allocate array of several GB in size (hence the exception).

This bug negatively influences stability of systems running Tika, as the decompressor can accidentally allocate as much memory as is available and other parts of the system then might not be able to allocate their objects.

A solution might be to add additional parameter to Tika config that would limit size of these arrays. If the size would be bigger, it would throw exception. This change should not be hard, as method InternalLZWInputStream.initializeTables() is protected.  

Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap space
	at org.apache.commons.compress.compressors.z._internal_.InternalLZWInputStream.initializeTables(InternalLZWInputStream.java:111)
	at org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>(ZCompressorInputStream.java:52)
	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:186)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:106)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:92)
	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)

  was:
When I try to detect ZIP container I rarely get this exception. It is caused by the fact that the file looks like ZIP container (magics), but in fact its random noise. So Apache decompress tries to find the size of tables (expects correct stream), loads coincidentally huge number (as on the given place there can be anything in the stream) and tries to allocate array of several GB in size (hence the exception).

A solution might be to add additional parameter to Tika config that would limit size of these arrays. If the size would be bigger, it would throw exception. This change should not be hard, as method InternalLZWInputStream.initializeTables() is protected.  

Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap space
	at org.apache.commons.compress.compressors.z._internal_.InternalLZWInputStream.initializeTables(InternalLZWInputStream.java:111)
	at org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>(ZCompressorInputStream.java:52)
	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:186)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:106)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:92)
	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)


> OutOfMemoryException in ZipContainerDetector
> --------------------------------------------
>
>                 Key: TIKA-1631
>                 URL: https://issues.apache.org/jira/browse/TIKA-1631
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.8
>            Reporter: Pavel Micka
>
> When I try to detect ZIP container I rarely get this exception. It is caused by the fact that the file looks like ZIP container (magics), but in fact its random noise. So Apache decompress tries to find the size of tables (expects correct stream), loads coincidentally huge number (as on the given place there can be anything in the stream) and tries to allocate array of several GB in size (hence the exception).
> This bug negatively influences stability of systems running Tika, as the decompressor can accidentally allocate as much memory as is available and other parts of the system then might not be able to allocate their objects.
> A solution might be to add additional parameter to Tika config that would limit size of these arrays. If the size would be bigger, it would throw exception. This change should not be hard, as method InternalLZWInputStream.initializeTables() is protected.  
> Exception in thread "pool-2-thread-2" java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.commons.compress.compressors.z._internal_.InternalLZWInputStream.initializeTables(InternalLZWInputStream.java:111)
> 	at org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>(ZCompressorInputStream.java:52)
> 	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:186)
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:106)
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:92)
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)