You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/12/05 12:41:39 UTC
[jira] [Commented] (TIKA-800) mark/reset not supported from
POIFSContainerDetector
[ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162730#comment-13162730 ]
Nick Burch commented on TIKA-800:
---------------------------------
Looks like the issue is that ArchiveInputStream (from Commons Compress) doesn't support mark/reset
My hunch is that there are two fixes needed here:
* If the POIFS detector (now by run by default if the parser jar is available) can't mark/reset, it should decline to detect
* The TikaCLI extractor should wrap the InputStreams it gets to ensure that all detectors can run
If no-one spots a snag with these, I'll make the changes in a little bit
> mark/reset not supported from POIFSContainerDetector
> ----------------------------------------------------
>
> Key: TIKA-800
> URL: https://issues.apache.org/jira/browse/TIKA-800
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.0, 1.1
> Reporter: Andrzej Bialecki
>
> {code}
> bash-3.2$ touch test.txt
> bash-3.2$ zip test.zip test.txt
> adding: test.txt (stored 0%)
> bash-3.2$ java -jar tika-app-1.1-SNAPSHOT.jar -z test.zip
> Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pkg.PackageParser@2d58f9d3
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:397)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101)
> Caused by: java.io.IOException: mark/reset not supported
> at java.io.InputStream.reset(InputStream.java:330)
> at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:116)
> at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> at org.apache.tika.cli.TikaCLI$FileEmbeddedDocumentExtractor.parseEmbedded(TikaCLI.java:676)
> at org.apache.tika.parser.pkg.PackageExtractor.unpack(PackageExtractor.java:167)
> at org.apache.tika.parser.pkg.PackageExtractor.parse(PackageExtractor.java:96)
> at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:64)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
> ... 5 more
> bash-3.2$
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira