You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2011/05/05 14:32:17 UTC

OPENNLP-17 Custom feature gen config

Hi all,

https://issues.apache.org/jira/browse/OPENNLP-17

this issue is now discussed and around for quite some time
and I would like to finally try to reach some consensus here.

The issue proposes the ways to solve the problem of defining of
having a file which can be turned into a bunch of feature generator
objects.

These are all the solutions which I discussed for quite some time
with Tom Morton during SourceForge days.

I think the java script solution should not be implemented because
of the security problem it brings, having java script inside the models
makes it easy for someone to do something malicious on your machine
e.g. delete files. To prevent this we would need some kind of sandboxing
which brings more complexity and new issues.

The main disadvantage of having dependency injection (e.g. spring) in my 
eyes is
that it adds a new dependency to the project which is not nice, since 
OpenNLP
is a library which should come without dependencies. For example when 
you have
a bigger project it is very annoying when every library brings different 
dependencies
in. And using OpenNLP should of course always be a positive experience.
Another disadvantge I see here is that the xml to describe the feature 
generation
is rather long compared to a custom xml based dsl.

The solution I am +1 for is to make a custom xml format which defines 
how the feature
generators are put together. The big advantage I see here over 
dependency injection is that
the xml looks nicer and is also shorter which makes it easier to discuss 
different feature
generations e.g. on the mailing list.
This solution is more or less already implemented and I would also 
extend our documentation
to explain how it works. Another concern raised it that it might need 
more maintance than
the dependency injection solution, but I think that is not really be 
true in the long run, since
the DI library might also changed and might need updating to new APIs.
Just coding against the java library usually produces code which is very 
stable because the
underlying APIs never change and are very well tested.

If there are no objections from the other committers I would like to go 
ahead with
the custom xml solution.

Jörn

Re: OPENNLP-17 Custom feature gen config

Posted by Jörn Kottmann <ko...@gmail.com>.

Jason commented on the jira issue itself.

If there is no objection I will now go ahead and finish the work
and check it in.

I believe having this feature is very important, because otherwise it is
very difficult to change the feature generation easily or have all sorts of
different language dependent feature generation.

Jörn

On 5/5/11 2:32 PM, Jörn Kottmann wrote:
> Hi all,
>
> https://issues.apache.org/jira/browse/OPENNLP-17
>
> this issue is now discussed and around for quite some time
> and I would like to finally try to reach some consensus here.
>
> The issue proposes the ways to solve the problem of defining of
> having a file which can be turned into a bunch of feature generator
> objects.
>
> These are all the solutions which I discussed for quite some time
> with Tom Morton during SourceForge days.
>
> I think the java script solution should not be implemented because
> of the security problem it brings, having java script inside the models
> makes it easy for someone to do something malicious on your machine
> e.g. delete files. To prevent this we would need some kind of sandboxing
> which brings more complexity and new issues.
>
> The main disadvantage of having dependency injection (e.g. spring) in 
> my eyes is
> that it adds a new dependency to the project which is not nice, since 
> OpenNLP
> is a library which should come without dependencies. For example when 
> you have
> a bigger project it is very annoying when every library brings 
> different dependencies
> in. And using OpenNLP should of course always be a positive experience.
> Another disadvantge I see here is that the xml to describe the feature 
> generation
> is rather long compared to a custom xml based dsl.
>
> The solution I am +1 for is to make a custom xml format which defines 
> how the feature
> generators are put together. The big advantage I see here over 
> dependency injection is that
> the xml looks nicer and is also shorter which makes it easier to 
> discuss different feature
> generations e.g. on the mailing list.
> This solution is more or less already implemented and I would also 
> extend our documentation
> to explain how it works. Another concern raised it that it might need 
> more maintance than
> the dependency injection solution, but I think that is not really be 
> true in the long run, since
> the DI library might also changed and might need updating to new APIs.
> Just coding against the java library usually produces code which is 
> very stable because the
> underlying APIs never change and are very well tested.
>
> If there are no objections from the other committers I would like to 
> go ahead with
> the custom xml solution.
>
> Jörn