You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/12/05 12:41:39 UTC

[jira] [Commented] (TIKA-800) mark/reset not supported from POIFSContainerDetector

    [ https://issues.apache.org/jira/browse/TIKA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162730#comment-13162730 ] 

Nick Burch commented on TIKA-800:
---------------------------------

Looks like the issue is that ArchiveInputStream (from Commons Compress) doesn't support mark/reset

My hunch is that there are two fixes needed here:
 * If the POIFS detector (now by run by default if the parser jar is available) can't mark/reset, it should decline to detect
 * The TikaCLI extractor should wrap the InputStreams it gets to ensure that all detectors can run

If no-one spots a snag with these, I'll make the changes in a little bit
                
> mark/reset not supported from POIFSContainerDetector
> ----------------------------------------------------
>
>                 Key: TIKA-800
>                 URL: https://issues.apache.org/jira/browse/TIKA-800
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.0, 1.1
>            Reporter: Andrzej Bialecki 
>
> {code}
> bash-3.2$ touch test.txt
> bash-3.2$ zip test.zip test.txt
>   adding: test.txt (stored 0%)
> bash-3.2$ java -jar tika-app-1.1-SNAPSHOT.jar -z test.zip
> Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pkg.PackageParser@2d58f9d3
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:249)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
> 	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:397)
> 	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:101)
> Caused by: java.io.IOException: mark/reset not supported
> 	at java.io.InputStream.reset(InputStream.java:330)
> 	at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:116)
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
> 	at org.apache.tika.cli.TikaCLI$FileEmbeddedDocumentExtractor.parseEmbedded(TikaCLI.java:676)
> 	at org.apache.tika.parser.pkg.PackageExtractor.unpack(PackageExtractor.java:167)
> 	at org.apache.tika.parser.pkg.PackageExtractor.parse(PackageExtractor.java:96)
> 	at org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:64)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:243)
> 	... 5 more
> bash-3.2$ 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira