You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2023/09/26 12:27:46 UTC

[tika] branch TIKA-1599 updated: TIKA-1599 -- migrate to jsoup parser -- improve CHANGES.txt

This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


The following commit(s) were added to refs/heads/TIKA-1599 by this push:
     new 21224f196 TIKA-1599 -- migrate to jsoup parser -- improve CHANGES.txt
21224f196 is described below

commit 21224f196d61037924f3548214a03cd781d5a9b1
Author: tallison <ta...@apache.org>
AuthorDate: Tue Sep 26 08:27:33 2023 -0400

    TIKA-1599 -- migrate to jsoup parser -- improve CHANGES.txt
---
 CHANGES.txt | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 408e42676..7759b39e5 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -4,9 +4,16 @@ Release 3.0.0-BETA - ??
 
    * Require Java 11 (TIKA-4128).
 
-   * The boilerpipe handler has been moved to tika-handler-boiler-pipe
+   * The boilerpipe handler has been moved to the tika-handler-boiler-pipe
+     package (TIKA-4138).
+
+   * We've migrated HTML parsing to the JSoup parser instead of TagSoup. If
+     you have a custom configuration on the HTMLParser, you'll need to change
+     that to o.a.t.p.html.JSoupParser (TIKA-1599). The TagSoup parser is still
+     available in the tika-parser-tagsoup-module if you prefer the legacy parser.
 
    Other Changes/Updates
+
    * Fix bug in DateUtils that stripped timezone information from
      incoming Calendar objects (TIKA-4126).