You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/09/28 12:41:45 UTC

[jira] [Commented] (TIKA-733) [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

    [ https://issues.apache.org/jira/browse/TIKA-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116331#comment-13116331 ] 

Michael McCandless commented on TIKA-733:
-----------------------------------------

Hmm, it makes me a little nervous just blindly not popping the group
state once it's empty since this could be masking a more serious bug.

Ie, it's possible we are not correctly tokenizing the open / close
group tokens.

The other explanation is that the RTF doc is corrupt (has too many
closing } vs open {).

Can you look at the doc and figure out if its corrupt?

Does this RTF document work with older versions of Tika (before
TIKA-683 was committed)?
                
> [PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException
> ------------------------------------------------------------------
>
>                 Key: TIKA-733
>                 URL: https://issues.apache.org/jira/browse/TIKA-733
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>            Reporter: Jeremy Anderson
>            Assignee: Michael McCandless
>              Labels: patch
>             Fix For: 1.0
>
>         Attachments: TIKA-733-rtf_TextExtractor_processGroupEnd-NoSuchElementException.patch
>
>
> Parsing some RTF documents attempt to perform a removeLast() on the groupStates() list when the list is empty.  Added a check to not perform the logic when the list is empty, thus causing the restore group state to not be performed. Text extraction now completes without further down-stream errors.
> Unable to include sample file due to sensitive nature of file contents.
> StackTrace (TIKA-0.9)
> Caused by: java.util.NoSuchElementException
> 	at java.util.LinkedList.remove(LinkedList.java:788)
> 	at java.util.LinkedList.removeLast(LinkedList.java:144)
> 	at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1010)
> 	at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:352)
> 	at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:53)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	... 45 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira