You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Konstantin Gribov <gr...@gmail.com> on 2017/03/29 00:59:45 UTC

[Q] reason for tika-parser-*-bundle to be separated from corresponding parser modules in 2.x

Hi, folks.

I've been surprised by such separation, what was the reason to separate
them?

If there's no blockers I'd prefer to merge them: OSGi headers in
MANIFEST.MF do not affect artifact usage in usual Java SE environment and
it reduces number of artifacts drastically and simplifies dependency
management.

-- 

Best regards,
Konstantin Gribov

Re: [Q] reason for tika-parser-*-bundle to be separated from corresponding parser modules in 2.x

Posted by Bob Paulin <bo...@bobpaulin.com>.
Hey Konstantin,

Your observation is spot on and also is the reason why there is an
advantage to having separate ones.  The bundles are not meant to be used
outside of OSGi.  The current tika-bundle has many entries in the
MANIFEST.MF due to the embedded dependencies.  We also depend on maven to
resolve the transitive dependencies of each parser correctly.  As a result
it often breaks between releases.  As Tika continues to grow trying to
resolve conflicts between all the transitive dependencies these parsers
draw in will become more difficult.

- Bob

On Wed, Mar 29, 2017 at 4:30 AM, Konstantin Gribov <gr...@gmail.com>
wrote:

> Nick,
>
> I see now why it was done: we use `Embed-Dependency` for tika-bundle in 1.x
> and for tika-parser-*-bundle in 2.x so it produces fat/uber jar with some
> dependencies inlined in them.
> This is unsuitable for non-OSGi builds.
>
> So I withdraw my "bright" idea about merging corresponding modules and
> bundles.
>
> Concern about testing is pointless because bundles only contain OSGi
> integration tests which aren't to be run when `mvn test` is called and
> don't affect unit tests at all.
> If tests were the only issue, there would be no problem at all: each
> `BundleIT` just starts embedded OSGi container as any other junit test can
> start e.g. cxf jax-rs test container, arquillian container etc.
>
> ср, 29 мар. 2017 г. в 11:54, Nick Burch <ap...@gagravarr.org>:
>
> > On Wed, 29 Mar 2017, Konstantin Gribov wrote:
> > > I've been surprised by such separation, what was the reason to separate
> > > them?
> >
> > I think partly history (we split in 1.x), partly how the split was done
> > (osgi folks amongst the most keen), and partly a desire not to have
> > non-OSGi users getting a load of things they didn't need / might confuse
> > them?
> >
> > > If there's no blockers I'd prefer to merge them: OSGi headers in
> > > MANIFEST.MF do not affect artifact usage in usual Java SE environment
> > > and it reduces number of artifacts drastically and simplifies
> dependency
> > > management.
> >
> > Just adding a few headers to the manifest would be fine with me, that
> > seems low-risk and low-impact
> >
> > Not sure on the unit testing front - we'd want the current parser unit
> > tests to run fully outside OSGi, plus we want some other tests to run
> > within an OSGi environment to ensure the bundle's fine. Can we easily do
> > both if we merged? or would we want to merge the headers but leave the
> > OSGi-specific tests in another module?
> >
> > Nick
> >
> --
>
> Best regards,
> Konstantin Gribov
>

Re: [Q] reason for tika-parser-*-bundle to be separated from corresponding parser modules in 2.x

Posted by Konstantin Gribov <gr...@gmail.com>.
Nick,

I see now why it was done: we use `Embed-Dependency` for tika-bundle in 1.x
and for tika-parser-*-bundle in 2.x so it produces fat/uber jar with some
dependencies inlined in them.
This is unsuitable for non-OSGi builds.

So I withdraw my "bright" idea about merging corresponding modules and
bundles.

Concern about testing is pointless because bundles only contain OSGi
integration tests which aren't to be run when `mvn test` is called and
don't affect unit tests at all.
If tests were the only issue, there would be no problem at all: each
`BundleIT` just starts embedded OSGi container as any other junit test can
start e.g. cxf jax-rs test container, arquillian container etc.

ср, 29 мар. 2017 г. в 11:54, Nick Burch <ap...@gagravarr.org>:

> On Wed, 29 Mar 2017, Konstantin Gribov wrote:
> > I've been surprised by such separation, what was the reason to separate
> > them?
>
> I think partly history (we split in 1.x), partly how the split was done
> (osgi folks amongst the most keen), and partly a desire not to have
> non-OSGi users getting a load of things they didn't need / might confuse
> them?
>
> > If there's no blockers I'd prefer to merge them: OSGi headers in
> > MANIFEST.MF do not affect artifact usage in usual Java SE environment
> > and it reduces number of artifacts drastically and simplifies dependency
> > management.
>
> Just adding a few headers to the manifest would be fine with me, that
> seems low-risk and low-impact
>
> Not sure on the unit testing front - we'd want the current parser unit
> tests to run fully outside OSGi, plus we want some other tests to run
> within an OSGi environment to ensure the bundle's fine. Can we easily do
> both if we merged? or would we want to merge the headers but leave the
> OSGi-specific tests in another module?
>
> Nick
>
-- 

Best regards,
Konstantin Gribov

Re: [Q] reason for tika-parser-*-bundle to be separated from corresponding parser modules in 2.x

Posted by Nick Burch <ap...@gagravarr.org>.
On Wed, 29 Mar 2017, Konstantin Gribov wrote:
> I've been surprised by such separation, what was the reason to separate 
> them?

I think partly history (we split in 1.x), partly how the split was done 
(osgi folks amongst the most keen), and partly a desire not to have 
non-OSGi users getting a load of things they didn't need / might confuse 
them?

> If there's no blockers I'd prefer to merge them: OSGi headers in 
> MANIFEST.MF do not affect artifact usage in usual Java SE environment 
> and it reduces number of artifacts drastically and simplifies dependency 
> management.

Just adding a few headers to the manifest would be fine with me, that 
seems low-risk and low-impact

Not sure on the unit testing front - we'd want the current parser unit 
tests to run fully outside OSGi, plus we want some other tests to run 
within an OSGi environment to ensure the bundle's fine. Can we easily do 
both if we merged? or would we want to merge the headers but leave the 
OSGi-specific tests in another module?

Nick