You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2024/03/26 15:16:00 UTC

[jira] [Created] (TIKA-4226) Use jsoup for epubs

Tim Allison created TIKA-4226:
---------------------------------

             Summary: Use jsoup for epubs
                 Key: TIKA-4226
                 URL: https://issues.apache.org/jira/browse/TIKA-4226
             Project: Tika
          Issue Type: Improvement
            Reporter: Tim Allison


We're getting quite a few xml exceptions when parsing epubs (roughly 1k out of 8k total). We should use Jsoup to handle contents of epubs more robustly.

This is a proposal for 3.x. WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)