You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Konstantin Gribov (JIRA)" <ji...@apache.org> on 2015/04/10 19:09:12 UTC
[jira] [Reopened] (TIKA-1597) RTF with embedded image parsing
produces div before html
[ https://issues.apache.org/jira/browse/TIKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Gribov reopened TIKA-1597:
-------------------------------------
> RTF with embedded image parsing produces div before html
> --------------------------------------------------------
>
> Key: TIKA-1597
> URL: https://issues.apache.org/jira/browse/TIKA-1597
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.7
> Environment: linux, oracle jdk 7u75
> Reporter: Konstantin Gribov
> Attachments: 2.rtf, 3.rtf
>
>
> On tika-1.8-rc1.
> {{java -jar tika-app/target/tika-app-1.8.jar -x 2.rtf}} returns
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><div xmlns="http://www.w3.org/1999/xhtml">HOHcvanAHTI'Imoc
> v8 Hanemnan npfiBOBafi "DRAW
> </div>
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <!-- tail omitted -->
> {noformat}
> Removing image prevents such behavior ({{3.rtf}} doesn't contain embedded image).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)