You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/03/18 19:19:00 UTC

[jira] [Commented] (TIKA-3331) Return a more informative error when trying to parse encrypted ODT

    [ https://issues.apache.org/jira/browse/TIKA-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304373#comment-17304373 ] 

Tim Allison commented on TIKA-3331:
-----------------------------------

This is now fixed in {{branch_1x}} and {{main}}.  However, there's a bit of a catch (no pun intended).  If you handle the file as a stream, neither the JDK's ZipInputStream nor Commons Compress ZipArchiveInputStream can read the file because there's a descriptor on one of the streams, so you'll get a zip exception, not an EncryptedDocumentException.

 

The full fix for this problem would be to cache the stream, catch the exception, use ZipSalvager to save to file and on we go.  This is beyond the scope of this issue.  Adding decryption to openoffice files is also beyond this ticket.  Please open a new issue or, better, a PR if you need either of these capabilities.  Reference for the zip issue when streaming: https://issues.apache.org/jira/browse/ODFTOOLKIT-402

> Return a more informative error when trying to parse encrypted ODT
> ------------------------------------------------------------------
>
>                 Key: TIKA-3331
>                 URL: https://issues.apache.org/jira/browse/TIKA-3331
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.24.1
>         Environment: See enclosed picture.
>            Reporter: Bertrand Caron
>            Assignee: Tim Allison
>            Priority: Minor
>         Attachments: encrypte.odt, manifest.xml, system.png
>
>
> When parsing a PDF or ODF encrypted file, Tika returns a long, cryptic error message. A more informative message would be useful for the user - at least mention the encryption, and perhaps the algorithm used?
>  
> I enclose a fabricated example, but real-world examples can be found in a similar issue for the JHOVE tool: [https://github.com/openpreserve/jhove/issues/640]
>  
> The error log obtained:
>  
> Apache Tika was unable to parse the document
> at /home/bertrand/Téléchargements/Toponymic guidelines_Instituto geografico nacional_2011.pdf.
> The full exception stack trace is included below:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@5e7e878d
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>     at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
>     at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:84)
>     at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:358)
>     at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:309)
>     at org.apache.tika.gui.TikaGUI.actionPerformed(TikaGUI.java:267)
>     at java.desktop/javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1967)
>     at java.desktop/javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2308)
>     at java.desktop/javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:405)
>     at java.desktop/javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:262)
>     at java.desktop/javax.swing.AbstractButton.doClick(AbstractButton.java:369)
>     at java.desktop/javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1020)
>     at java.desktop/javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1064)
>     at java.desktop/java.awt.Component.processMouseEvent(Component.java:6636)
>     at java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
>     at java.desktop/java.awt.Component.processEvent(Component.java:6401)
>     at java.desktop/java.awt.Container.processEvent(Container.java:2263)
>     at java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5012)
>     at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321)
>     at java.desktop/java.awt.Component.dispatchEvent(Component.java:4844)
>     at java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4919)
>     at java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4548)
>     at java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4489)
>     at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307)
>     at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2764)
>     at java.desktop/java.awt.Component.dispatchEvent(Component.java:4844)
>     at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772)
>     at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721)
>     at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715)
>     at java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
>     at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
>     at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
>     at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745)
>     at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743)
>     at java.base/java.security.AccessController.doPrivileged(AccessController.java:391)
>     at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
>     at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
>     at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
>     at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
>     at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
>     at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
>     at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
>     at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)
> Caused by: java.lang.NullPointerException
>     at org.apache.tika.parser.pdf.AbstractPDF2XHTML.extractXMPXFA(AbstractPDF2XHTML.java:209)
>     at org.apache.tika.parser.pdf.AbstractPDF2XHTML.endDocument(AbstractPDF2XHTML.java:678)
>     at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:267)
>     at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:96)
>     at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:174)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     ... 44 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)