You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Robert Burrell Donkin <ro...@gmail.com> on 2009/05/15 23:52:59 UTC

MIME detection

i'd like to switch to tika for mime type detection in rat. the world
of dependencies for the org.apache.tika.parser worries me a little. i
think that it should be possible just to exclude them using maven (and
i'll probably begin by doing that) but the detection stuff is cool and
would be more generally useful without the parser dependencies.

what's the consensus about modularisation?

BTW i'll could probably write something up on detection if that'd be
useful. (these days, i find confluence has a lot quicker document
cycle than maven. so, i wondered whether there were any plans to move
tika's main documentation to confluence)

- robert

Re: MIME detection

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Sat, May 16, 2009 at 7:18 AM, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> Incidentally, I've started using tika-core for MIME detection only
> yesterday. At runtime anyway, the dependencies are only Commons Lang, IO
> and Logging. So, not too bad, is it? But I would also welcome if MIME
> detection were separate, since I currently don't need text extraction in
> my application. I don't feel strongly about it, though.

i was looking at tika-0.3. now i've checked out trunk, i see that the
modularisation has already happened :-)

the tika-core dependencies are fine by me

- robert

Re: MIME detection

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Incidentally, I've started using tika-core for MIME detection only
yesterday. At runtime anyway, the dependencies are only Commons Lang, IO
and Logging. So, not too bad, is it? But I would also welcome if MIME
detection were separate, since I currently don't need text extraction in
my application. I don't feel strongly about it, though.

On 15.05.2009 23:52:59 Robert Burrell Donkin wrote:
> i'd like to switch to tika for mime type detection in rat. the world
> of dependencies for the org.apache.tika.parser worries me a little. i
> think that it should be possible just to exclude them using maven (and
> i'll probably begin by doing that) but the detection stuff is cool and
> would be more generally useful without the parser dependencies.
> 
> what's the consensus about modularisation?
> 
> BTW i'll could probably write something up on detection if that'd be
> useful. (these days, i find confluence has a lot quicker document
> cycle than maven. so, i wondered whether there were any plans to move
> tika's main documentation to confluence)
> 
> - robert




Jeremias Maerki