You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/07/11 20:15:00 UTC

[jira] [Resolved] (TIKA-2899) org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@375a26af

     [ https://issues.apache.org/jira/browse/TIKA-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-2899.
-------------------------------
       Resolution: Fixed
         Assignee: Tim Allison
    Fix Version/s: 1.22

I added a stack that tracks p, li, ol and ul elements written to the xml handler.  It ensures alignment of elements in the output even if the RTF is corrupt.

I am not convinced that the attached file has any problems, but the change will ensure matched elements in the output.

If there are any objections to this fix, please let me know, and I can revert.

> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@375a26af
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2899
>                 URL: https://issues.apache.org/jira/browse/TIKA-2899
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.19
>            Reporter: Pandurang
>            Assignee: Tim Allison
>            Priority: Critical
>             Fix For: 1.22
>
>         Attachments: ABC_PL_WI.rtf
>
>
> I am using Solr 8.0 by using solrnet liabrary we extracting some binary data to text. In that case we are getting below error.
> Its working fine for 99 % documents but its failing for only 1 % docs
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@375a26af
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>  at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>  at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>  at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>  ... 41 more



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)