You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/01/13 12:33:26 UTC

[jira] [Resolved] (TIKA-2159) Handle pre-parse embedded object exceptions uniformly and more robustly

     [ https://issues.apache.org/jira/browse/TIKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-2159.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.15
                   2.0

I added TikaCoreProperties.TIKA_META_EXCEPTION_EMBEDDED_STREAM property to store stacktraces in the parent file when there is an or other exception trying to read the stream of an embedded file.  

May be some areas for further work...I focused on MSOffice, PDF and RTF.

> Handle pre-parse embedded object exceptions uniformly and more robustly
> -----------------------------------------------------------------------
>
>                 Key: TIKA-2159
>                 URL: https://issues.apache.org/jira/browse/TIKA-2159
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.0, 1.15
>
>
> When an embedded document is parsed and causes an exception, we're currently catching that and swallowing it in ParsingEmbeddedDocumentExtractor (the default) or reporting it in the RecursiveParserWrapper by storing the stacktrace in the Metadata of the embedded document.
> However, if there's an exception during detection on the embedded stream or on getting the stream _before_ the stream hits the parser, we aren't handling that uniformly or robustly across parsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)