You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by gr...@apache.org on 2019/04/22 16:35:10 UTC

[tika] branch branch_1x updated (15ac3da -> 55c573b)

This is an automated email from the ASF dual-hosted git repository.

grossws pushed a change to branch branch_1x
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from 15ac3da  TIKA-2835 upgrade to PDFBox 2.0.15
     new 8a112a9  TIKA-2555 -- fixed overlapping tags in XWPFWordExtractorDecorator (docx parser)
     new e4ae7e9  Use tag <s> instead of <strike> for docx
     new 55c573b  TIKA-2601 -- fixed overlapping tags in WordExtractor (doc parser)

The 4381 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../tika/parser/microsoft/FormattingUtils.java     |  88 +++++++++++
 .../tika/parser/microsoft/WordExtractor.java       |  89 +----------
 .../ooxml/XWPFWordExtractorDecorator.java          | 162 ++-------------------
 .../tika/parser/microsoft/WordParserTest.java      |   4 +-
 .../parser/microsoft/ooxml/OOXMLParserTest.java    |  10 +-
 5 files changed, 116 insertions(+), 237 deletions(-)
 create mode 100644 tika-parsers/src/main/java/org/apache/tika/parser/microsoft/FormattingUtils.java