You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Carsten Ziegeler <cz...@apache.org> on 2007/07/02 12:05:36 UTC
Re: Questions
Bertrand Delacretaz wrote:
> On 6/30/07, Grant Ingersoll <gs...@apache.org> wrote:
>
>> ...My main concern w/ extracting Nutch is all the dependencies on
>> Hadoop, etc. But it does seem like the shortest path for me....
>
> I've mentioned Tika to a few colleagues lately, and one thing that
> comes up often is that there are many document/format parsing
> libraries around, which should ideally be usable as Tika plugins with
> as little changes as possible.
>
> But these libraries' dependencies are all around the place, and
> probably conflicting in many cases.
>
> It might be good to take that into account in the design of Tika, and
> use solid classloading and isolation mechanisms. OSGI comes to mind,
> assuming it doesn't bloat the whole thing.
>
Yes, in many cases a solid classloading mechanism is a must and OSGi
definitly implements this properly.
I think, we can leave this open (= do not need to require OSGi) if we
have an open way of registering the plugins. Registering in an OSGi
environment might then be slightly different compared to registering in
a non OSGi environmnent. Of course, using the latter one might result in
classloading problems :) But then it's up to the developer to decide in
which environment tika should run with all the pros and cons that come
with this decision.
Carsten
Re: Questions
Posted by Rida Benjelloun <ri...@doculibre.com>.
+1
Rida Benjelloun
On 7/8/07, Chris Mattmann <ch...@jpl.nasa.gov> wrote:
>
> +1 here too. I would love to have a light-weight plugin loading mechanism,
> and like the idea of not having to pick a single mechanism.
>
> Cheers,
> Chris
>
>
>
> On 7/2/07 4:38 AM, "Jukka Zitting" <ju...@gmail.com> wrote:
>
> > Hi,
> >
> > On 7/2/07, Carsten Ziegeler <cz...@apache.org> wrote:
> >> Bertrand Delacretaz wrote:
> >>> I've mentioned Tika to a few colleagues lately, and one thing that
> >>> comes up often is that there are many document/format parsing
> >>> libraries around, which should ideally be usable as Tika plugins with
> >>> as little changes as possible.
> >>>
> >>> But these libraries' dependencies are all around the place, and
> >>> probably conflicting in many cases.
> >>>
> >>> It might be good to take that into account in the design of Tika, and
> >>> use solid classloading and isolation mechanisms. OSGI comes to mind,
> >>> assuming it doesn't bloat the whole thing.
> >>>
> >> Yes, in many cases a solid classloading mechanism is a must and OSGi
> >> definitly implements this properly.
> >> I think, we can leave this open (= do not need to require OSGi) if we
> >> have an open way of registering the plugins. Registering in an OSGi
> >> environment might then be slightly different compared to registering in
> >> a non OSGi environmnent. Of course, using the latter one might result
> in
> >> classloading problems :) But then it's up to the developer to decide in
> >> which environment tika should run with all the pros and cons that come
> >> with this decision.
> >
> > +1 I think that the core Tika framework should be very lightweigth and
> > easily composable in various different environments. I even think that
> > we shouldn't mandate any "official" configuration or composition
> > mechanism. We may have some simple implementation as the default, but
> > it should be possible to use things like Spring or OSGi or whatever to
> > manage more complex scenarios.
> >
> > BR,
> >
> > Jukka Zitting
>
>
>
Re: Questions
Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
+1 here too. I would love to have a light-weight plugin loading mechanism,
and like the idea of not having to pick a single mechanism.
Cheers,
Chris
On 7/2/07 4:38 AM, "Jukka Zitting" <ju...@gmail.com> wrote:
> Hi,
>
> On 7/2/07, Carsten Ziegeler <cz...@apache.org> wrote:
>> Bertrand Delacretaz wrote:
>>> I've mentioned Tika to a few colleagues lately, and one thing that
>>> comes up often is that there are many document/format parsing
>>> libraries around, which should ideally be usable as Tika plugins with
>>> as little changes as possible.
>>>
>>> But these libraries' dependencies are all around the place, and
>>> probably conflicting in many cases.
>>>
>>> It might be good to take that into account in the design of Tika, and
>>> use solid classloading and isolation mechanisms. OSGI comes to mind,
>>> assuming it doesn't bloat the whole thing.
>>>
>> Yes, in many cases a solid classloading mechanism is a must and OSGi
>> definitly implements this properly.
>> I think, we can leave this open (= do not need to require OSGi) if we
>> have an open way of registering the plugins. Registering in an OSGi
>> environment might then be slightly different compared to registering in
>> a non OSGi environmnent. Of course, using the latter one might result in
>> classloading problems :) But then it's up to the developer to decide in
>> which environment tika should run with all the pros and cons that come
>> with this decision.
>
> +1 I think that the core Tika framework should be very lightweigth and
> easily composable in various different environments. I even think that
> we shouldn't mandate any "official" configuration or composition
> mechanism. We may have some simple implementation as the default, but
> it should be possible to use things like Spring or OSGi or whatever to
> manage more complex scenarios.
>
> BR,
>
> Jukka Zitting
Re: Questions
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On 7/2/07, Carsten Ziegeler <cz...@apache.org> wrote:
> Bertrand Delacretaz wrote:
> > I've mentioned Tika to a few colleagues lately, and one thing that
> > comes up often is that there are many document/format parsing
> > libraries around, which should ideally be usable as Tika plugins with
> > as little changes as possible.
> >
> > But these libraries' dependencies are all around the place, and
> > probably conflicting in many cases.
> >
> > It might be good to take that into account in the design of Tika, and
> > use solid classloading and isolation mechanisms. OSGI comes to mind,
> > assuming it doesn't bloat the whole thing.
> >
> Yes, in many cases a solid classloading mechanism is a must and OSGi
> definitly implements this properly.
> I think, we can leave this open (= do not need to require OSGi) if we
> have an open way of registering the plugins. Registering in an OSGi
> environment might then be slightly different compared to registering in
> a non OSGi environmnent. Of course, using the latter one might result in
> classloading problems :) But then it's up to the developer to decide in
> which environment tika should run with all the pros and cons that come
> with this decision.
+1 I think that the core Tika framework should be very lightweigth and
easily composable in various different environments. I even think that
we shouldn't mandate any "official" configuration or composition
mechanism. We may have some simple implementation as the default, but
it should be possible to use things like Spring or OSGi or whatever to
manage more complex scenarios.
BR,
Jukka Zitting