You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Stefan Bodewig (JIRA)" <ji...@apache.org> on 2017/02/04 14:15:52 UTC

[jira] [Commented] (COMPRESS-382) OutOfMemoryError from CompressorStreamFactory

    [ https://issues.apache.org/jira/browse/COMPRESS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852806#comment-15852806 ] 

Stefan Bodewig commented on COMPRESS-382:
-----------------------------------------

LZMA is identified by a three byte header, which is very short and this may lead to quite a few false positives. I'm not sure how to avoid this without removing LZMA autodetection completely.

What you describe makes me think Tika could benefit from us adding some kind of "does this look like a compressor stream"  or even "what kind of format do you think this is" kind of method in {{CompressorStreamFactory}}.

> OutOfMemoryError from CompressorStreamFactory
> ---------------------------------------------
>
>                 Key: COMPRESS-382
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-382
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.10, 1.11, 1.12
>         Environment: Windows7, jre1.8.0_101 x64
>            Reporter: Luis Filipe Nassif
>         Attachments: data.mui
>
>
> While using Tika-1.14 to detect file types, the attached 1KB file triggered an OOME with 1GB heap. Tika calls CompressorStreamFactory.createCompressorInputStream(in) to detect if the file is a compressor stream, but CompressorStreamFactory erroneously detects it as a LZMACompressorInputStream and when the LZMACompressorInputStream is instanciated the OOME is thrown. This error does not happen with commons-compress versions prior to 1.10, when auto detecting LZMA streams was added. OOME stacktrace below:
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at org.tukaani.xz.lz.LZDecoder.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream.<init>(LZMACompressorInputStream.java:48) ~[commons-compress-1.10.jar:1.10]
> 	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:251) ~[commons-compress-1.10.jar:1.10]
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:109) ~[tika-parsers-1.14.jar:1.14]
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:95) ~[tika-parsers-1.14.jar:1.14]
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) ~[tika-core-1.14.jar:1.14]
> 	at dpf.sp.gpinf.indexer.process.task.SignatureTask.process(SignatureTask.java:50) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processMonitorTimeout(AbstractTask.java:203) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:152) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.Worker.process(Worker.java:174) ~[iped.jar:?]
> 	... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)