You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/09/18 13:04:00 UTC

[jira] [Resolved] (TIKA-2435) docx parser missing content when multiple body sections

     [ https://issues.apache.org/jira/browse/TIKA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison resolved TIKA-2435.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.17

> docx parser missing content when multiple body sections
> -------------------------------------------------------
>
>                 Key: TIKA-2435
>                 URL: https://issues.apache.org/jira/browse/TIKA-2435
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>             Fix For: 1.17
>
>
> On https://bz.apache.org/bugzilla/show_bug.cgi?id=61354, [~kramachandran@commvault.com] reported that our DOM parser was missing "body" sections after the first body section in docx.  PJ Fanning applied the patch, and this will be available when we upgrade to POI 3.17-beta2.
> As a side note, the experimental SAX parser was correctly extracting all text from the triggering document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)