You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/05/23 18:46:47 UTC
[jira] [Commented] (TIKA-666) Unable to extract content from RTF
files
[ https://issues.apache.org/jira/browse/TIKA-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038036#comment-13038036 ]
Jukka Zitting commented on TIKA-666:
------------------------------------
The exception I get when parsing this document is:
{quote}
Exception in thread "main" org.apache.tika.exception.TikaException: Error parsing an RTF document
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:135)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:125)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:339)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:96)
Caused by: java.lang.NullPointerException
at java.util.Hashtable.put(Hashtable.java:394)
at javax.swing.text.rtf.RTFReader$AttributeTrackingDestination.handleKeyword(RTFReader.java:1279)
at javax.swing.text.rtf.RTFReader.handleKeyword(RTFReader.java:470)
at javax.swing.text.rtf.RTFParser.write(RTFParser.java:232)
at javax.swing.text.rtf.RTFParser.write(RTFParser.java:117)
at javax.swing.text.rtf.AbstractFilter.write(AbstractFilter.java:155)
at javax.swing.text.rtf.AbstractFilter.readFromStream(AbstractFilter.java:88)
at javax.swing.text.rtf.RTFEditorKit.read(RTFEditorKit.java:65)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:117)
... 6 more
{quote}P@
The error seems to be pretty deep inside the RTF parser in javax.swing, so there isn't much we can do about this in Tika.
> Unable to extract content from RTF files
> ----------------------------------------
>
> Key: TIKA-666
> URL: https://issues.apache.org/jira/browse/TIKA-666
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.8, 0.9
> Environment: Windows 32 bit OS, JDK 1.6.19
> Reporter: samraj
> Labels: RTF
> Attachments: Redline.rtf
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> HI,
> I have tried with various set of RTF document to extract the text Content. I have tried so many technique to extract the text from rtf.. Its failed. I have attached the RTF document here
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira