You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2011/08/25 16:28:00 UTC

UIMA, Maven & OSGi, modularity

A few more high level thoughts about OSGi and UIMA.
 
Both try to address "modularity".  OSGi is about package versioning (or bundle
versioning), and expressing wiring via exports and imports (dependencies) among
packages / bundles.  (OSGi is of course about lots of other stuff, too.)

UIMA addresses modularity by externalizing metadata about components in XML
descriptors.  These can specify versions (but we don't make use of that at the
moment), and "aggregation" can depend on delegates.

The UIMA way of binding to delegate info is either by location (absolute or
path-relative), or by using an externally setup classpath (e.g., by name). 
However, UIMA doesn't have very much support for setting up classpath (it has
PEAR files for "switching" the classpath, and the uima-bootstrap code).  Users
often have trouble in getting the classpath set up properly.

If we think of annotators as OSGi bundles, they can contain within them the
information needed to create a proper classpath.  They do this by importing
packages or requiring-bundles by "name" and "version" ranges.

When hooked up to bundle repositories (or Maven), support is already "out there"
to fetch needed dependencies at the right version level.

Sometimes the dependencies are already OSGi bundles; other times they are just
plain Java Jars.
I have found, in both Apache Karaf and pax-construct, support for automatically
wrapping the plain Java Jars into OSGi bundles.  For example, Karaf has a
feature where you can "drop" things into a monitored directory and they will be
installed into a running OSGi framework.  If you drop a plain Jar, it is
automatically wrapped into a bundle.

If we can get this working, a good outcome would be getting rid of "setting up
the classpath" issues for running UIMA pipelines.  One would instead construct a
top level aggregate and have that "import" the versions wanted for the delegate
components, as well as any other dependencies.  (A button could be added to the
Eclipse configurator, that, when pressed, would produce for an aggregate, the
appropriate OSGi bundle for it, using version info etc. to specify delegates).

The various existing OSGi tools seem to support multiple styles of creating
bundles: if you have an Annotator that depends on other Jars, you can either
incorporate those Jars within the bundle, or you can depend on them (in the OSGi
import-package/bundle sense), and keep your bundle small.  This latter way seems
the preferable approach.  With this style, something like the TikaAnnotator,
which today might include the tika-core.jar and the tika-parser.jar, would
instead just have the UIMA code, and depend on these other Jars at some
version-range levels.  This would allow a more flexible evolution of the
application (e.g., one could upgrade the tika-core jar independently of other
things). 

-Marshall

Re: UIMA, Maven & OSGi, modularity

Posted by Tommaso Teofili <to...@gmail.com>.
Hi all,
just for your information at Clerezza we're discussing on how to proceed
with using the current 2.3.1 Addons annotators inside an OSGi environment
[1].
I think that could be a nice 'benchmark' for future UIMA and OSGi related
developments.
My 2 cents,
Tommaso

[1] :
http://www.mail-archive.com/clerezza-dev@incubator.apache.org/msg05420.html

2011/8/25 Marshall Schor <ms...@schor.com>

>
>
> On 8/25/2011 2:06 PM, Richard Eckart de Castilho wrote:
> > Am 25.08.2011 um 16:28 schrieb Marshall Schor:
> >
> >> ...
> >> The various existing OSGi tools seem to support multiple styles of
> creating
> >> bundles: if you have an Annotator that depends on other Jars, you can
> either
> >> incorporate those Jars within the bundle, or you can depend on them (in
> the OSGi
> >> import-package/bundle sense), and keep your bundle small.  This latter
> way seems
> >> the preferable approach.
> >> ...
> > I agree that this depending (in the OSGi sense) is the preferable way,
> but the problem is, that not all jars are available as OSGi bundles. Also,
> not all OSGi frameworks support that drop-in mechanism for JARs that you
> explain (I think Equinox has nothing like this).
>
> Karaf and pax-construct both allow specifying different OSGi frameworks -
> they're "outside" the framework.  So, for instance, they can work with
> Equinox.
> The Karaf overview page says "Supports the latest OSGi 4.2 containers:
> Apache
> Felix Framework 3.0 and Eclipse Equinox 3.6".
>
> > Even if a framework provides such a mechanism, I wonder how is handles
> cases where I have two annotators A and B depending on the same artifact but
> in two incompatible versions (e.g. a version 1.x and a version 2.x). Can
> e.g. Apache Karaf automatically generate proper versions even if the JAR
> dropped into the folder does not contain any version information at all, so
> that one is wired to A and the other to B?
> I don't know.
>
> I think pax-construct uses pom information when getting Jars from Maven to
> supply this, though.
> >
> > I like in Maven that it automatically materializes all dependencies on my
> machine and I do not have to do anything.
>
> +1
> > When I install an OSGi-bundled annotator, the same thing should be the
> case.
>
> +1
> > It come with all dependencies that are not readily available on Eclipse
> Update Sites - it may depend on stuff that's available out there and that
> Eclipse can automatically resolve and download.
>
> +1
> > Unfortunately, I believe that the Eclipse Update Site ecosystem is much
> smaller than that of Maven. Something that might help here is the
> Springsource Enterprise Bundle Repository (
> http://ebr.springsource.com/repository/app/), but lots of stuff is also
> not covered there (e.g. Tika).
>
> One other feature available in pax-construct is the ability to treat maven
> repos
> as if they were OSGi repos.  See:
> http://www.sonatype.com/books/mcookbook/reference/ch01s04.html
>
> They have examples of importing both OSGi bundle things, and non-OSGi
> dependencies.
> >
> > While I agree that it's better to depend on libraries, for the most part,
> I think adding bundling dependencies is more practical for the end user. The
> alternative would be that the UIMA project offers an Update Site with
> OSGi-ified versions of the dependencies required by the annotators. I
> personally would not go down that road though, as I believe it causes lots
> of work regarding maintenance of such bundles.
>
> +1 to not going down that road :-)
>
> -Marshall (who's not actually tried any of this :-) )
> >
> > So far my thoughts.
> >
> > Best,
> >
> > -- Richard
> >
>

Re: UIMA, Maven & OSGi, modularity

Posted by Marshall Schor <ms...@schor.com>.

On 8/25/2011 2:06 PM, Richard Eckart de Castilho wrote:
> Am 25.08.2011 um 16:28 schrieb Marshall Schor:
>
>> ...
>> The various existing OSGi tools seem to support multiple styles of creating
>> bundles: if you have an Annotator that depends on other Jars, you can either
>> incorporate those Jars within the bundle, or you can depend on them (in the OSGi
>> import-package/bundle sense), and keep your bundle small.  This latter way seems
>> the preferable approach.
>> ...
> I agree that this depending (in the OSGi sense) is the preferable way, but the problem is, that not all jars are available as OSGi bundles. Also, not all OSGi frameworks support that drop-in mechanism for JARs that you explain (I think Equinox has nothing like this). 

Karaf and pax-construct both allow specifying different OSGi frameworks -
they're "outside" the framework.  So, for instance, they can work with Equinox. 
The Karaf overview page says "Supports the latest OSGi 4.2 containers: Apache
Felix Framework 3.0 and Eclipse Equinox 3.6".

> Even if a framework provides such a mechanism, I wonder how is handles cases where I have two annotators A and B depending on the same artifact but in two incompatible versions (e.g. a version 1.x and a version 2.x). Can e.g. Apache Karaf automatically generate proper versions even if the JAR dropped into the folder does not contain any version information at all, so that one is wired to A and the other to B?
I don't know.

I think pax-construct uses pom information when getting Jars from Maven to
supply this, though.
>
> I like in Maven that it automatically materializes all dependencies on my machine and I do not have to do anything. 

+1
> When I install an OSGi-bundled annotator, the same thing should be the case. 

+1
> It come with all dependencies that are not readily available on Eclipse Update Sites - it may depend on stuff that's available out there and that Eclipse can automatically resolve and download. 

+1
> Unfortunately, I believe that the Eclipse Update Site ecosystem is much smaller than that of Maven. Something that might help here is the Springsource Enterprise Bundle Repository (http://ebr.springsource.com/repository/app/), but lots of stuff is also not covered there (e.g. Tika).

One other feature available in pax-construct is the ability to treat maven repos
as if they were OSGi repos.  See:
http://www.sonatype.com/books/mcookbook/reference/ch01s04.html

They have examples of importing both OSGi bundle things, and non-OSGi dependencies.
>
> While I agree that it's better to depend on libraries, for the most part, I think adding bundling dependencies is more practical for the end user. The alternative would be that the UIMA project offers an Update Site with OSGi-ified versions of the dependencies required by the annotators. I personally would not go down that road though, as I believe it causes lots of work regarding maintenance of such bundles.

+1 to not going down that road :-)

-Marshall (who's not actually tried any of this :-) )
>
> So far my thoughts.
>
> Best,
>
> -- Richard
>

Re: UIMA, Maven & OSGi, modularity

Posted by Richard Eckart de Castilho <ec...@tk.informatik.tu-darmstadt.de>.
Am 25.08.2011 um 16:28 schrieb Marshall Schor:

> ...
> The various existing OSGi tools seem to support multiple styles of creating
> bundles: if you have an Annotator that depends on other Jars, you can either
> incorporate those Jars within the bundle, or you can depend on them (in the OSGi
> import-package/bundle sense), and keep your bundle small.  This latter way seems
> the preferable approach.
> ...

I agree that this depending (in the OSGi sense) is the preferable way, but the problem is, that not all jars are available as OSGi bundles. Also, not all OSGi frameworks support that drop-in mechanism for JARs that you explain (I think Equinox has nothing like this). Even if a framework provides such a mechanism, I wonder how is handles cases where I have two annotators A and B depending on the same artifact but in two incompatible versions (e.g. a version 1.x and a version 2.x). Can e.g. Apache Karaf automatically generate proper versions even if the JAR dropped into the folder does not contain any version information at all, so that one is wired to A and the other to B?

I like in Maven that it automatically materializes all dependencies on my machine and I do not have to do anything. When I install an OSGi-bundled annotator, the same thing should be the case. It come with all dependencies that are not readily available on Eclipse Update Sites - it may depend on stuff that's available out there and that Eclipse can automatically resolve and download. Unfortunately, I believe that the Eclipse Update Site ecosystem is much smaller than that of Maven. Something that might help here is the Springsource Enterprise Bundle Repository (http://ebr.springsource.com/repository/app/), but lots of stuff is also not covered there (e.g. Tika).

While I agree that it's better to depend on libraries, for the most part, I think adding bundling dependencies is more practical for the end user. The alternative would be that the UIMA project offers an Update Site with OSGi-ified versions of the dependencies required by the annotators. I personally would not go down that road though, as I believe it causes lots of work regarding maintenance of such bundles.

So far my thoughts.

Best,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------