You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/09/23 22:59:26 UTC

[jira] [Commented] (TIKA-632) Rtf parsing ignores links

    [ https://issues.apache.org/jira/browse/TIKA-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113761#comment-13113761 ] 

Nick Burch commented on TIKA-632:
---------------------------------

Now we have our own RTF parser, it may be possible to add this. For an example, the RTF from /test-documents/test-outlook2003.msg for a part containing a hyperlink is the delightful:

-----------
{\*\htmltag84 <I>}\htmlrtf {\i \htmlrtf0 If you want to let us know what you think about Outlook 2003, reply to this message. We're always looking for feedback from the people who use Outlook every day! If you would like to keep up with the latest information about Outlook, sign up for a free subscription to the 

{\*\htmltag84 <A HREF="http://r.office.microsoft.com/r/rlidNewsletterSignUp?clid=1033">}\htmlrtf {\field{\*\fldinst{HYPERLINK "http://r.office.microsoft.com/r/rlidNewsletterSignUp?clid=1033"}}{\fldrslt\cf1\ul \htmlrtf0 Inside Office Newsletter\htmlrtf }\htmlrtf0 \htmlrtf }\htmlrtf0 

{\*\htmltag92 </A>}. The newsletter will be sent to you by e-mail on a regular basis.
-----------



> Rtf parsing ignores links
> -------------------------
>
>                 Key: TIKA-632
>                 URL: https://issues.apache.org/jira/browse/TIKA-632
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>         Attachments: test.rtf
>
>
> I spotted this while working on TIKA-631 - an RTF file containing links has the link skipped over - neither the link text nor the link href are output.
> In the attached sample file (which is the RTF contents of /test-documents/test-outlook2003.msg), we should see things like:
> [a href="http://r.office.microsoft.com/r/rlidOutlookWelcomeMail1?clid=1033">Streamlined Mail Experience[/a> - Outlook
> Instead, all we get is " - Outlook"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira