You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/01/18 17:01:00 UTC

[jira] [Commented] (TIKA-2818) RarParser throws EncryptedDocumentException only when whole archive is encrypted

    [ https://issues.apache.org/jira/browse/TIKA-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746482#comment-16746482 ] 

Tim Allison commented on TIKA-2818:
-----------------------------------

Something like this from the RecursiveParserWrapper?

{noformat}
0: X-Parsed-By : org.apache.tika.parser.DefaultParser
0: X-Parsed-By : org.apache.tika.parser.pkg.RarParser
0: X-TIKA:content_handler : ToXMLContentHandler
0: X-TIKA:parse_time_millis : 195
0: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.RarParser" />
<meta name="Content-Type" content="application/x-rar-compressed" />
<title></title>
</head>
<body><div> </div>
<div class="embedded" id="encrypted.txt" />
<div class="package-entry"><h1>encrypted.txt</h1></div></body></html>
0: Content-Type : application/x-rar-compressed
1: embeddedRelationshipId : encrypted.txt
1: X-TIKA:EXCEPTION:embedded_exception : org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted
	at org.apache.tika.parser.pkg.RarParser$EncryptedDocumentExceptionInputStream.read(RarParser.java:119)
	at java.io.InputStream.read(InputStream.java:170)
	at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	at java.io.FilterInputStream.read(FilterInputStream.java:107)
	at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:78)
	at org.apache.tika.io.TikaInputStream.peek(TikaInputStream.java:572)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:149)
	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:147)
	at org.apache.tika.parser.RecursiveParserWrapper$EmbeddedParserDecorator.parse(RecursiveParserWrapper.java:370)
	at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
	at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:105)
	at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:90)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:277)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
	at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:224)
	at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:263)
	at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:219)
	at org.apache.tika.parser.pkg.RarParserTest.testSingleEncryptedRar(RarParserTest.java:163)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

1: meta:save-date : 2019-01-18T10:17:30Z
1: X-TIKA:EXCEPTION:embedded_parser : org.apache.tika.parser.AutoDetectParser
1: X-TIKA:parse_time_millis : 5
1: resourceName : encrypted.txt
1: dcterms:modified : 2019-01-18T10:17:30Z
1: Last-Modified : 2019-01-18T10:17:30Z
1: Content-Length : 23
1: X-TIKA:embedded_resource_path : /encrypted.txt
{noformat}

> RarParser throws EncryptedDocumentException only when whole archive is encrypted
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-2818
>                 URL: https://issues.apache.org/jira/browse/TIKA-2818
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Pavel Arnošt
>            Priority: Minor
>         Attachments: rar4_encrypted_content_only.rar
>
>
> RarParser throws EncryptedDocumentException only if whole archive is encrypted. If encryption is on individial files, parser ends with org.apache.tika.exception.TikaException: RarParser Exception:
> Caused by: org.apache.tika.exception.TikaException: RarParser Exception
>  at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:99)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
>  at ... 43 more
> Caused by: com.github.junrar.exception.RarException: ioError
>  at com.github.junrar.Archive.getInputStream(Archive.java:525)
>  at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:81)
>  ... 48 more
> Caused by: com.github.junrar.exception.RarException: crcError
>  at com.github.junrar.Archive.doExtractFile(Archive.java:557)
>  at com.github.junrar.Archive.extractFile(Archive.java:498)
>  at com.github.junrar.Archive.getInputStream(Archive.java:523)
>  ... 49 more
> File encryption should be checked before trying to extract content on line 79 like this:
> FileHeader header = rar.nextFileHeader();
> if (header.isEncrypted()) {
>     throw new EncryptedDocumentException();
> }
> while (header != null && !Thread.currentThread().isInterrupted()) {
> Or maybe insert it into metadata with TikaCoreProperties.TIKA_META_EXCEPTION_EMBEDDED_STREAM key? I don't know, but current behaviour is not correct (parsing fails).
> Sample document is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)