You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2019/07/11 20:15:30 UTC

[tika] branch branch_1x updated (830094e -> 02437c5)

This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch branch_1x
in repository https://gitbox.apache.org/repos/asf/tika.git.


    from 830094e  TIKA-1568 -- statically cache encoding detector in AutoDetectReader when default initializer is used.
     add 02437c5  TIKA-2899 -- prevent non-aligned tags in xhtml output...I am not convinced there's anything wrong with this RTF, and I may have just covered up list processing bugs in our parser, but this will guarantee balanced tags...

No new revisions were added by this update.

Summary of changes:
 .../org/apache/tika/parser/rtf/TextExtractor.java  |  80 +-
 .../org/apache/tika/parser/rtf/RTFParserTest.java  |   7 +-
 .../resources/test-documents/testRTFTIKA_2899.rtf  | 836 +++++++++++++++++++++
 3 files changed, 914 insertions(+), 9 deletions(-)
 create mode 100644 tika-parsers/src/test/resources/test-documents/testRTFTIKA_2899.rtf