You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tim Allison <ta...@apache.org> on 2022/12/12 16:15:55 UTC

[DISCUSS] Some unsupported dependencies

The recent question about cyberneko reminded me that we have a couple
of really old unsupported libraries in tika-app and
tika-server-standard.

1) The boilerpipe handler uses:
de.l3s.boilerpipe:boilerpipe:jar:1.1.0, which hasn't been updated
since 2010.  It would be a breaking change, but I wonder if we should
require users to add that to their classpath for tika-app and
tika-server.

2) tagsoup hasn't been updated since 2011.  Should we move the primary
html parser to jsoup (last updated in August 2022)?

I don't want to break anything that is working, and neither project
has any open CVEs against them. However, 11 years without an update
gives me pause.

What do you think?

Best,

          Tim

P.S. There's also, of course, tika-age-recogniser which we are
effectively not supporting, and it brings in a kitchen sink of ancient
dependencies.  I propose that we improve the documentation around it,
but leave it as is because it is "opt in".