You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Mike Rodent (JIRA)" <ji...@apache.org> on 2017/02/12 15:04:41 UTC

[jira] [Updated] (TIKA-2264) Better handling of footnotes/endnotes for ODF files

     [ https://issues.apache.org/jira/browse/TIKA-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Rodent updated TIKA-2264:
------------------------------
    Attachment: ImprovedODFContentParser.java

Note that this is peppered with multiple comments by me.  It also contains various methods which I used in developing these changes.  It also uses a LOGGER to permit a means of logging any anomalies.  As a complete newb to this process of hopefully contributing to an open source project I invite everyone to mess around with it as they see fit.

> Better handling of footnotes/endnotes for ODF files
> ---------------------------------------------------
>
>                 Key: TIKA-2264
>                 URL: https://issues.apache.org/jira/browse/TIKA-2264
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.14
>         Environment: N/A
>            Reporter: Mike Rodent
>            Priority: Minor
>              Labels: newbie
>         Attachments: ImprovedODFContentParser.java
>
>
> Springs from my question here (http://stackoverflow.com/questions/42031237/modify-apache-tika-parsing-of-old-1997-2003-ms-word-docs) ... I have improve the class OpenDocumentContentParser so that it puts footnotes/endnotes at the end of the line to which they belong and doesn't break up the line in question.  As with .docx parsing the notes can be linked to the reference easily.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)