You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Konstantin Gribov (JIRA)" <ji...@apache.org> on 2015/04/10 19:09:12 UTC

[jira] [Reopened] (TIKA-1597) RTF with embedded image parsing produces div before html

     [ https://issues.apache.org/jira/browse/TIKA-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Gribov reopened TIKA-1597:
-------------------------------------

> RTF with embedded image parsing produces div before html
> --------------------------------------------------------
>
>                 Key: TIKA-1597
>                 URL: https://issues.apache.org/jira/browse/TIKA-1597
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.7
>         Environment: linux, oracle jdk 7u75
>            Reporter: Konstantin Gribov
>         Attachments: 2.rtf, 3.rtf
>
>
> On tika-1.8-rc1.
> {{java -jar tika-app/target/tika-app-1.8.jar -x 2.rtf}} returns
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><div xmlns="http://www.w3.org/1999/xhtml">HOHcvanAHTI'Imoc
> v8 Hanemnan npfiBOBafi "DRAW
> </div>
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <!-- tail omitted -->
> {noformat}
> Removing image prevents such behavior ({{3.rtf}} doesn't contain embedded image).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)