You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2018/12/03 15:34:21 UTC
[tika] branch branch_1x updated (4d6bc01 -> 44165a3)
This is an automated email from the ASF dual-hosted git repository.
tallison pushed a change to branch branch_1x
in repository https://gitbox.apache.org/repos/asf/tika.git.
from 4d6bc01 TIKA-2550 -- prevent content from script/style elements to be written in ToTextContentHandler
add d837e1b Upgrade to PDFBox 2.0.13 (TIKA-2788)
add 6b56ed2 TIKA-2779: Integrate/parameterize new rotated text handling in PDFBox
add 6322421 TIKA-2751 -- Upgrade to POI 4.0.1
add 44165a3 TIKA-2550 -- make sure that ToTextHandler's new behavior of ignoring script/style contents doesn't harm macro extraction in HTML parser
No new revisions were added by this update.
Summary of changes:
CHANGES.txt | 7 ++
tika-bundle/pom.xml | 1 +
.../src/test/java/org/apache/tika/TikaTest.java | 6 +-
tika-eval/pom.xml | 4 -
tika-nlp/pom.xml | 4 +
tika-parent/pom.xml | 1 +
tika-parsers/pom.xml | 3 +-
.../java/org/apache/tika/parser/pdf/PDF2XHTML.java | 136 ++++++++++++++++++---
.../java/org/apache/tika/parser/pdf/PDFParser.java | 5 +
.../apache/tika/parser/pdf/PDFParserConfig.java | 11 ++
.../apache/tika/parser/pdf/PDFParser.properties | 3 +
.../apache/tika/parser/html/HtmlParserTest.java | 5 +-
.../org/apache/tika/parser/pdf/PDFParserTest.java | 14 +++
.../resources/test-documents/testPDF_angles.pdf | Bin 0 -> 797493 bytes
14 files changed, 178 insertions(+), 22 deletions(-)
create mode 100644 tika-parsers/src/test/resources/test-documents/testPDF_angles.pdf