You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/03/13 19:20:00 UTC

[jira] [Resolved] (TIKA-2838) RTF document processing glues comment fields together with text without whitespace

     [ https://issues.apache.org/jira/browse/TIKA-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-2838.
-------------------------------
       Resolution: Fixed
         Assignee: Tim Allison
    Fix Version/s: 1.21
                   2.0.0

At some point, we should add the same markup we do in docx, etc.  For now, I've added spaces.

There is no immediately obvious right answer for an annotation that falls mid word as happens in the test file.  We are currently now adding spaces to cut the word.

{noformat}
super<div class="comment>this is a comment</div>califragili...
{noformat}
is now rendered as 
{noformat}
super this is a comment califragili
{noformat}

> RTF document processing glues comment fields together with text without whitespace
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-2838
>                 URL: https://issues.apache.org/jira/browse/TIKA-2838
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17, 1.19
>            Reporter: Karl Wright
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 2.0.0, 1.21
>
>
> See ManifoldCF ticket CONNECTORS-1591 for a sample document and a description of the problem.  Basically, comment fields for RTF documents are glued together with no whitespace between them, while other document formats properly put in a space (e.g. .docx etc).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)