You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Bob Paulin <bo...@bobpaulin.com> on 2015/12/01 04:48:47 UTC
Re: more modular parser bundles
Created 2.x Branch.
https://svn.apache.org/repos/asf/tika/branches/2.x
On 11/30/2015 3:12 PM, Bob Paulin wrote:
> This makes sense. I think providing an "all" jar with all the parsers
> will be convenient for new developers. The modular parsers would give
> more developers a means to insulate themselves from changes and
> upgrades to other parsers. This is currently not available when all
> of the parsers are combine. So my expectation would be that the jar
> with all the parsers would be good for general applications or POC.
> While the modules would target production deployments where developers
> know what they want and would like to limit risk. Also agree that new
> documentation will be required!
>
> - Bob
>
> On Mon, Nov 30, 2015 at 2:50 PM, Nick Burch <apache@gagravarr.org
> <ma...@gagravarr.org>> wrote:
>
> On Mon, 30 Nov 2015, Allison, Timothy B. wrote:
>
> Perhaps we could start with a tika-advanced-bundle to gather
> all of the nlp/advanced parsers? Or would this have to wait
> for Tika 2.0?
>
>
> I've noticed that there have been a lot fewer queries (on our
> list, on stackoverflow, at events etc) caused by people missing
> jars of late. Not sure of the message has got out there better,
> the right posts are getting to the top of google, the
> troubleshooting page has done its magic, or something else
> entirely! But I'm now less worried about the impact of modular
> parsers on newbies that I have been before
>
> To try to avoid all the existing guidance (most of it external)
> from going stale, I'd lean towards either keeping "tika-parsers"
> as the full version, or make "tika-parsers" be an alias to
> "tika-parsers-all", so that current behaviour remains
>
> I'd also probably suggest we change the default load error handler
> to warn/log, so that people by default will find out more quickly
> that they've missed jars, and probably also have an extra load
> error log/check which triggers in the event of 0 parser
> definitions being found. People can turn that off if they want, as
> now, but maybe the new default should be so that newbies tend to
> get told quickly what they've done wrong!
>
> Oh, and we'll need to update the troubleshooting page too for the
> new bundles world :)
>
> Nick
>
>
RE: more modular parser bundles
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Woohoo!!!
So much to do...
Thank you!
-----Original Message-----
From: Bob Paulin [mailto:bob@bobpaulin.com]
Sent: Monday, November 30, 2015 10:49 PM
To: dev@tika.apache.org
Subject: Re: more modular parser bundles
Created 2.x Branch.
https://svn.apache.org/repos/asf/tika/branches/2.x
On 11/30/2015 3:12 PM, Bob Paulin wrote:
> This makes sense. I think providing an "all" jar with all the parsers
> will be convenient for new developers. The modular parsers would give
> more developers a means to insulate themselves from changes and
> upgrades to other parsers. This is currently not available when all
> of the parsers are combine. So my expectation would be that the jar
> with all the parsers would be good for general applications or POC.
> While the modules would target production deployments where developers
> know what they want and would like to limit risk. Also agree that new
> documentation will be required!
>
> - Bob
>
> On Mon, Nov 30, 2015 at 2:50 PM, Nick Burch <apache@gagravarr.org
> <ma...@gagravarr.org>> wrote:
>
> On Mon, 30 Nov 2015, Allison, Timothy B. wrote:
>
> Perhaps we could start with a tika-advanced-bundle to gather
> all of the nlp/advanced parsers? Or would this have to wait
> for Tika 2.0?
>
>
> I've noticed that there have been a lot fewer queries (on our
> list, on stackoverflow, at events etc) caused by people missing
> jars of late. Not sure of the message has got out there better,
> the right posts are getting to the top of google, the
> troubleshooting page has done its magic, or something else
> entirely! But I'm now less worried about the impact of modular
> parsers on newbies that I have been before
>
> To try to avoid all the existing guidance (most of it external)
> from going stale, I'd lean towards either keeping "tika-parsers"
> as the full version, or make "tika-parsers" be an alias to
> "tika-parsers-all", so that current behaviour remains
>
> I'd also probably suggest we change the default load error handler
> to warn/log, so that people by default will find out more quickly
> that they've missed jars, and probably also have an extra load
> error log/check which triggers in the event of 0 parser
> definitions being found. People can turn that off if they want, as
> now, but maybe the new default should be so that newbies tend to
> get told quickly what they've done wrong!
>
> Oh, and we'll need to update the troubleshooting page too for the
> new bundles world :)
>
> Nick
>
>