You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Jukka Zitting <ju...@gmail.com> on 2008/12/08 13:19:21 UTC

Managing the classpath (Was: XML formats vs. parser libraries)

Hi,

On Mon, Dec 8, 2008 at 8:27 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> I am OK with this, but I would wish to have a simple way to
> configure/Plugin/plugout parser with their complete dependencies.

That's a valid but IMHO a mostly orthogonal issue. Branching off to a
separate thread.

> If it would be possible to correctly determine which library and parser is
> needed for which document type, there should be away of switching other
> parsers completely off (so no ClassNotFoundExceptions are generated
> when the auto detect parser hits an unsupported document type).

Agreed, we should make the parser classes fail more gracefully when
the required parser library is not available.

> My problem with this highly sophisticated parser libraries outside of tika
> are the classpath pollutions, [...]

Yeah, that's our version of the DLL hell. We briefly touched that last
year (see http://markmail.org/message/ji3xabugnt6wlwdh), but so far
there's been no real attempt to solve that problem.

The basic dilemma is that as long as we want to keep Tika simple to
use (I think that's one of the main benefits of Tika!), we're going to
have to live with this problem. For example for Ant projects the
standalone jar is probably the easiest thing to use, but with that
simplicity also comes with restrictions.

The way I see it, we should keep Tika simple to use, but also provide
alternatives (like OSGi packaging, more modular Maven components,
etc.) for people who need more control.

BR,

Jukka Zitting