You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tika.apache.org by ta...@apache.org on 2023/09/26 12:27:46 UTC
[tika] branch TIKA-1599 updated: TIKA-1599 -- migrate to jsoup parser -- improve CHANGES.txt
This is an automated email from the ASF dual-hosted git repository.
tallison pushed a commit to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git
The following commit(s) were added to refs/heads/TIKA-1599 by this push:
new 21224f196 TIKA-1599 -- migrate to jsoup parser -- improve CHANGES.txt
21224f196 is described below
commit 21224f196d61037924f3548214a03cd781d5a9b1
Author: tallison <ta...@apache.org>
AuthorDate: Tue Sep 26 08:27:33 2023 -0400
TIKA-1599 -- migrate to jsoup parser -- improve CHANGES.txt
---
CHANGES.txt | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/CHANGES.txt b/CHANGES.txt
index 408e42676..7759b39e5 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -4,9 +4,16 @@ Release 3.0.0-BETA - ??
* Require Java 11 (TIKA-4128).
- * The boilerpipe handler has been moved to tika-handler-boiler-pipe
+ * The boilerpipe handler has been moved to the tika-handler-boiler-pipe
+ package (TIKA-4138).
+
+ * We've migrated HTML parsing to the JSoup parser instead of TagSoup. If
+ you have a custom configuration on the HTMLParser, you'll need to change
+ that to o.a.t.p.html.JSoupParser (TIKA-1599). The TagSoup parser is still
+ available in the tika-parser-tagsoup-module if you prefer the legacy parser.
Other Changes/Updates
+
* Fix bug in DateUtils that stripped timezone information from
incoming Calendar objects (TIKA-4126).