You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/05/23 18:46:47 UTC

[jira] [Commented] (TIKA-666) Unable to extract content from RTF files

    [ https://issues.apache.org/jira/browse/TIKA-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038036#comment-13038036 ] 

Jukka Zitting commented on TIKA-666:
------------------------------------

The exception I get when parsing this document is:

{quote}
Exception in thread "main" org.apache.tika.exception.TikaException: Error parsing an RTF document
	at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:135)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:125)
	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:339)
	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:96)
Caused by: java.lang.NullPointerException
	at java.util.Hashtable.put(Hashtable.java:394)
	at javax.swing.text.rtf.RTFReader$AttributeTrackingDestination.handleKeyword(RTFReader.java:1279)
	at javax.swing.text.rtf.RTFReader.handleKeyword(RTFReader.java:470)
	at javax.swing.text.rtf.RTFParser.write(RTFParser.java:232)
	at javax.swing.text.rtf.RTFParser.write(RTFParser.java:117)
	at javax.swing.text.rtf.AbstractFilter.write(AbstractFilter.java:155)
	at javax.swing.text.rtf.AbstractFilter.readFromStream(AbstractFilter.java:88)
	at javax.swing.text.rtf.RTFEditorKit.read(RTFEditorKit.java:65)
	at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:117)
	... 6 more
{quote}P@

The error seems to be pretty deep inside the RTF parser in javax.swing, so there isn't much we can do about this in Tika.

> Unable to extract content from RTF files
> ----------------------------------------
>
>                 Key: TIKA-666
>                 URL: https://issues.apache.org/jira/browse/TIKA-666
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8, 0.9
>         Environment: Windows 32 bit OS, JDK 1.6.19
>            Reporter: samraj
>              Labels: RTF
>         Attachments: Redline.rtf
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> HI,
>  I have tried with various set of RTF document to extract the text Content. I have tried so many technique to extract the text from rtf.. Its failed. I have attached the RTF document here

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira