You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Thomas Zastrow <po...@thomas-zastrow.de> on 2013/10/29 16:54:16 UTC

License for NE model?

Dear all,

I created now a named entity model for German. It is trained on 5.000 
manually annotated sentences and performs - not perfect, but its already 
usable. I will go on with more texts.

I used only texts from Wikipedia and Wikinews, so in my eyes it 
shouldn't be a problem to distribute the model. But I'm not sure which 
license would be a good choice: OpenNLP uses the Apache license, but 
Wikipedia is Creative Commons. On the other hand, because I have the 
"raw" trained data, it would be easy to train other NE detectors with 
the data.

The OpenNLP page doesn't say anything about the licences of the models 
which can be found there already.

So, what do you think, would be the best license for

a)
a trained model

and

b)
the raw data which is overall Wikipedia content

?

Thanks in advance and best regards,

Tom


-- 
Dr. Thomas Zastrow
Riemerfeldring 7a

85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de

Re: License for NE model?

Posted by Thomas Zastrow <po...@thomas-zastrow.de>.

Dear all,

Thanks for the information.

Am 30.10.2013 13:20, schrieb Jörn Kottmann:
> On 10/30/2013 12:03 PM, Nils Reiter wrote:
>> I guess the question is whether a trained model is an “adaptation” of 
>> the work according to the license. If that’s the case you’re bound to 
>> using creative commons, I think.
>

I want to publish both: the binary model and the raw, manually annotated 
texts. The latter is derivated work from Wikipedia, you can still read 
the articles and just have some annotations in between. So, for that 
file(s) it will be the original Wikipedia license.

> The model does not contain the original texts, it contains the words 
> and bigrams,
> but that nothing the original author has a copyright on.
>

Hhm, thats the point: I know from other contexts, that also trained 
models from Treebanks have to be under the same condition than the 
original treebank. So I'm not sure if I'm free to use another license 
for the binary file. And I don't know whats about the other models on 
the OpenNLP page: I used the German tokenizer and sentence-detector 
model, together with the OpenNLP tools. At least, my binary model is a 
mixture of CC, Apache License and whatever is used for the already 
existing models.

>
> Any interest to contribute your work back to OpenNLP? It would really 
> be a great start for us
> to finally have some annotated data as proper Open Source as well. The 
> wikipedia effort can probably
> easily be replicated for other language

Yes, of course. I build this model for my own hobby project, but I 
always had in mind to give it free. I also implemented a graphical user 
interface for doing manually NE annotation ... all the OpenNLP tools are 
integrated and now, it can be seen as a generic graphical user interface 
for OpenNLP. That tool is far away from beeing perfect, but I think I 
will publish a "beta of a pre-alpha version" the next days :-)

I also found out that the tokenizer and sentence model for German are 
... not the best ones. I don't know who did them, but they are lacking 
some very common features of German texts.

Last not least, I'm working on some converters for the OpenNLP formats, 
because I need the output beeing TCF. Still don't found the hook in the 
code if and where that would fit.

Best,

Tom

-- 
Dr. Thomas Zastrow
Riemerfeldring 7a

85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de

Re: License for NE model?

Posted by Jörn Kottmann <ko...@gmail.com>.

On 10/30/2013 12:03 PM, Nils Reiter wrote:
> I guess the question is whether a trained model is an “adaptation” of the work according to the license. If that’s the case you’re bound to using creative commons, I think.

The model does not contain the original texts, it contains the words and 
bigrams,
but that nothing the original author has a copyright on.

It should be ok to license the model under a different license.
Do you intend to have a different license for the annotations as well?

Any interest to contribute your work back to OpenNLP? It would really be 
a great start for us
to finally have some annotated data as proper Open Source as well. The 
wikipedia effort can probably
easily be replicated for other languages.

Jörn

Re: License for NE model?

Posted by Nils Reiter <re...@cl.uni-heidelberg.de>.

Hi,

doesn’t the Wikipedia/creative commons license specify exactly that you can only redistribute under the same/similar license? 

I guess the question is whether a trained model is an “adaptation” of the work according to the license. If that’s the case you’re bound to using creative commons, I think.

Best,
Nils




On 29.10.2013, at 16:54, Thomas Zastrow <po...@thomas-zastrow.de> wrote:

> Dear all,
> 
> I created now a named entity model for German. It is trained on 5.000 manually annotated sentences and performs - not perfect, but its already usable. I will go on with more texts.
> 
> I used only texts from Wikipedia and Wikinews, so in my eyes it shouldn't be a problem to distribute the model. But I'm not sure which license would be a good choice: OpenNLP uses the Apache license, but Wikipedia is Creative Commons. On the other hand, because I have the "raw" trained data, it would be easy to train other NE detectors with the data.
> 
> The OpenNLP page doesn't say anything about the licences of the models which can be found there already.
> 
> So, what do you think, would be the best license for
> 
> a)
> a trained model
> 
> and
> 
> b)
> the raw data which is overall Wikipedia content
> 
> ?
> 
> Thanks in advance and best regards,
> 
> Tom
> 
> 
> -- 
> Dr. Thomas Zastrow
> Riemerfeldring 7a
> 
> 85748 Garching
> Tel.: 0162 422 8029
> www.thomas-zastrow.de
> 
>