You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2024/03/26 15:16:00 UTC
[jira] [Created] (TIKA-4226) Use jsoup for epubs
Tim Allison created TIKA-4226:
---------------------------------
Summary: Use jsoup for epubs
Key: TIKA-4226
URL: https://issues.apache.org/jira/browse/TIKA-4226
Project: Tika
Issue Type: Improvement
Reporter: Tim Allison
We're getting quite a few xml exceptions when parsing epubs (roughly 1k out of 8k total). We should use Jsoup to handle contents of epubs more robustly.
This is a proposal for 3.x. WDYT?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)