You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Robert Kaulbach <ro...@vipre.com> on 2020/08/06 00:24:02 UTC

Missing hyperlink after parsing .odt file

The attached file was created in Google Docs with an image inside and saved
as an .odt file. After saving, I opened the file with LibreOffice and added
a hyperlink to the image.

When I parse the file with Tika, neither LinkContentHandler or
ToXMLContentHandler show any trace of the hyperlink.

The link is clickable when I open the document, and I can see it inside
content.xml after extracting the document with 7zip:
*<draw:a xlink:type="simple" xlink:href="http://example.test/
<http://example.test/>">*

I tried enabling all options in OfficeParserConfig and OOXMLParser but it
hasn't made a difference so far. The X-Parsed-By header shows it is being
parsed with org.apache.tika.parser.odf.OpenDocumentParser.

Could this be a bug with the org.apache.tika.parser.odf.OpenDocumentParser?

-- 


This email, its contents and attachments contain information from J2 
Global, Inc. and/or its affiliates which may be privileged, confidential or 
otherwise protected from disclosure. The information is intended to be for 
the addressee(s) only. If you are not an addressee, any disclosure, copy, 
distribution or use of the contents of this message is prohibited. If you 
have received this email in error, please notify the sender by reply email 
and delete the original message and any copies.


Re: Missing hyperlink after parsing .odt file

Posted by Tim Allison <ta...@apache.org>.
Y.  Looks like a bug.  Please open an issue on our JIRA:
https://issues.apache.org/jira/projects/TIKA/summary

On Wed, Aug 5, 2020 at 8:27 PM Robert Kaulbach <ro...@vipre.com>
wrote:

> The attached file was created in Google Docs with an image inside and
> saved as an .odt file. After saving, I opened the file with LibreOffice and
> added a hyperlink to the image.
>
> When I parse the file with Tika, neither LinkContentHandler or
> ToXMLContentHandler show any trace of the hyperlink.
>
> The link is clickable when I open the document, and I can see it inside
> content.xml after extracting the document with 7zip:
> *<draw:a xlink:type="simple" xlink:href="http://example.test/
> <http://example.test/>">*
>
> I tried enabling all options in OfficeParserConfig and OOXMLParser but it
> hasn't made a difference so far. The X-Parsed-By header shows it is being
> parsed with org.apache.tika.parser.odf.OpenDocumentParser.
>
> Could this be a bug with the org.apache.tika.parser.odf.OpenDocumentParser?
>
>
> ------------------------------
>
> This email, its contents and attachments contain information from J2
> Global, Inc. and/or its affiliates which may be privileged, confidential or
> otherwise protected from disclosure. The information is intended to be for
> the addressee(s) only. If you are not an addressee, any disclosure, copy,
> distribution or use of the contents of this message is prohibited. If you
> have received this email in error, please notify the sender by reply email
> and delete the original message and any copies.
>