You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Dave Meikle (JIRA)" <ji...@apache.org> on 2017/11/24 01:13:00 UTC

[jira] [Resolved] (TIKA-2347) Underlined text is not decorated as such when extracting from word documents

     [ https://issues.apache.org/jira/browse/TIKA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Meikle resolved TIKA-2347.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.17

Committed in [639f3bf361a08210da8fae68e3eeb4e12df6c4de|https://github.com/apache/tika/commit/639f3bf361a08210da8fae68e3eeb4e12df6c4de]. Thanks Stuart!

> Underlined text is not decorated as such when extracting from word documents
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-2347
>                 URL: https://issues.apache.org/jira/browse/TIKA-2347
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.0, 1.14
>            Reporter: Stuart Hendren
>            Assignee: Dave Meikle
>             Fix For: 1.17
>
>
> When extracting from doc and docx bold and italic text decoration is extracted, however underlining is not.  Can be demonstrated in WordParserTest or OOXMLParserTest (change to docx) with the following test case.
> {code:title=WordParserTest.java|borderStyle=solid}
>     @Test
>     public void testTextDecoration() throws Exception {
>       XMLResult result = getXML("testWORD_various.doc");
>       String xml = result.xml;
>       assertTrue(xml.contains("<b>Bold</b>"));
>       assertTrue(xml.contains("<i>italic</i>"));
>       assertTrue(xml.contains("<u>underline</u>"));
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)