You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Pavel Arnošt (JIRA)" <ji...@apache.org> on 2019/01/18 09:29:00 UTC

[jira] [Updated] (TIKA-2818) RarParser throws EncryptedDocumentException only when whole archive is encrypted

     [ https://issues.apache.org/jira/browse/TIKA-2818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Arnošt updated TIKA-2818:
-------------------------------
    Summary: RarParser throws EncryptedDocumentException only when whole archive is encrypted  (was: RarParser throws EncryptedDocumentException only when whole archiveis encrypted)

> RarParser throws EncryptedDocumentException only when whole archive is encrypted
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-2818
>                 URL: https://issues.apache.org/jira/browse/TIKA-2818
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Pavel Arnošt
>            Priority: Minor
>         Attachments: rar4_encrypted_content_only.rar
>
>
> RarParser throws EncryptedDocumentException only if whole archive is encrypted. If encryption is on individial files, parser ends with org.apache.tika.exception.TikaException: RarParser Exception:
> Caused by: org.apache.tika.exception.TikaException: RarParser Exception
>  at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:99)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
>  at ... 43 more
> Caused by: com.github.junrar.exception.RarException: ioError
>  at com.github.junrar.Archive.getInputStream(Archive.java:525)
>  at org.apache.tika.parser.pkg.RarParser.parse(RarParser.java:81)
>  ... 48 more
> Caused by: com.github.junrar.exception.RarException: crcError
>  at com.github.junrar.Archive.doExtractFile(Archive.java:557)
>  at com.github.junrar.Archive.extractFile(Archive.java:498)
>  at com.github.junrar.Archive.getInputStream(Archive.java:523)
>  ... 49 more
> File encryption should be checked before trying to extract content on line 79 like this:
> FileHeader header = rar.nextFileHeader();
> if (header.isEncrypted()) {
>     throw new EncryptedDocumentException();
> }
> while (header != null && !Thread.currentThread().isInterrupted()) {
> Or maybe insert it into metadata with TikaCoreProperties.TIKA_META_EXCEPTION_EMBEDDED_STREAM key? I don't know, but current behaviour is not correct (parsing fails).
> Sample document is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)