You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2018/12/03 15:34:21 UTC

[tika] branch branch_1x updated (4d6bc01 -> 44165a3)

This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch branch_1x
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from 4d6bc01  TIKA-2550 -- prevent content from script/style elements to be written in ToTextContentHandler
     add d837e1b  Upgrade to PDFBox 2.0.13 (TIKA-2788)
     add 6b56ed2  TIKA-2779: Integrate/parameterize new rotated text handling in PDFBox
     add 6322421  TIKA-2751 -- Upgrade to POI 4.0.1
     add 44165a3  TIKA-2550 -- make sure that ToTextHandler's new behavior of ignoring script/style contents doesn't harm macro extraction in HTML parser

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt                                        |   7 ++
 tika-bundle/pom.xml                                |   1 +
 .../src/test/java/org/apache/tika/TikaTest.java    |   6 +-
 tika-eval/pom.xml                                  |   4 -
 tika-nlp/pom.xml                                   |   4 +
 tika-parent/pom.xml                                |   1 +
 tika-parsers/pom.xml                               |   3 +-
 .../java/org/apache/tika/parser/pdf/PDF2XHTML.java | 136 ++++++++++++++++++---
 .../java/org/apache/tika/parser/pdf/PDFParser.java |   5 +
 .../apache/tika/parser/pdf/PDFParserConfig.java    |  11 ++
 .../apache/tika/parser/pdf/PDFParser.properties    |   3 +
 .../apache/tika/parser/html/HtmlParserTest.java    |   5 +-
 .../org/apache/tika/parser/pdf/PDFParserTest.java  |  14 +++
 .../resources/test-documents/testPDF_angles.pdf    | Bin 0 -> 797493 bytes
 14 files changed, 178 insertions(+), 22 deletions(-)
 create mode 100644 tika-parsers/src/test/resources/test-documents/testPDF_angles.pdf