You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Thomas Zastrow <po...@thomas-zastrow.de> on 2013/10/29 16:54:16 UTC
License for NE model?
Dear all,
I created now a named entity model for German. It is trained on 5.000
manually annotated sentences and performs - not perfect, but its already
usable. I will go on with more texts.
I used only texts from Wikipedia and Wikinews, so in my eyes it
shouldn't be a problem to distribute the model. But I'm not sure which
license would be a good choice: OpenNLP uses the Apache license, but
Wikipedia is Creative Commons. On the other hand, because I have the
"raw" trained data, it would be easy to train other NE detectors with
the data.
The OpenNLP page doesn't say anything about the licences of the models
which can be found there already.
So, what do you think, would be the best license for
a)
a trained model
and
b)
the raw data which is overall Wikipedia content
?
Thanks in advance and best regards,
Tom
--
Dr. Thomas Zastrow
Riemerfeldring 7a
85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de
Re: License for NE model?
Posted by Thomas Zastrow <po...@thomas-zastrow.de>.
Dear all,
Thanks for the information.
Am 30.10.2013 13:20, schrieb Jörn Kottmann:
> On 10/30/2013 12:03 PM, Nils Reiter wrote:
>> I guess the question is whether a trained model is an “adaptation” of
>> the work according to the license. If that’s the case you’re bound to
>> using creative commons, I think.
>
I want to publish both: the binary model and the raw, manually annotated
texts. The latter is derivated work from Wikipedia, you can still read
the articles and just have some annotations in between. So, for that
file(s) it will be the original Wikipedia license.
> The model does not contain the original texts, it contains the words
> and bigrams,
> but that nothing the original author has a copyright on.
>
Hhm, thats the point: I know from other contexts, that also trained
models from Treebanks have to be under the same condition than the
original treebank. So I'm not sure if I'm free to use another license
for the binary file. And I don't know whats about the other models on
the OpenNLP page: I used the German tokenizer and sentence-detector
model, together with the OpenNLP tools. At least, my binary model is a
mixture of CC, Apache License and whatever is used for the already
existing models.
>
> Any interest to contribute your work back to OpenNLP? It would really
> be a great start for us
> to finally have some annotated data as proper Open Source as well. The
> wikipedia effort can probably
> easily be replicated for other language
Yes, of course. I build this model for my own hobby project, but I
always had in mind to give it free. I also implemented a graphical user
interface for doing manually NE annotation ... all the OpenNLP tools are
integrated and now, it can be seen as a generic graphical user interface
for OpenNLP. That tool is far away from beeing perfect, but I think I
will publish a "beta of a pre-alpha version" the next days :-)
I also found out that the tokenizer and sentence model for German are
... not the best ones. I don't know who did them, but they are lacking
some very common features of German texts.
Last not least, I'm working on some converters for the OpenNLP formats,
because I need the output beeing TCF. Still don't found the hook in the
code if and where that would fit.
Best,
Tom
--
Dr. Thomas Zastrow
Riemerfeldring 7a
85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de
Re: License for NE model?
Posted by Jörn Kottmann <ko...@gmail.com>.
On 10/30/2013 12:03 PM, Nils Reiter wrote:
> I guess the question is whether a trained model is an “adaptation” of the work according to the license. If that’s the case you’re bound to using creative commons, I think.
The model does not contain the original texts, it contains the words and
bigrams,
but that nothing the original author has a copyright on.
It should be ok to license the model under a different license.
Do you intend to have a different license for the annotations as well?
Any interest to contribute your work back to OpenNLP? It would really be
a great start for us
to finally have some annotated data as proper Open Source as well. The
wikipedia effort can probably
easily be replicated for other languages.
Jörn
Re: License for NE model?
Posted by Nils Reiter <re...@cl.uni-heidelberg.de>.
Hi,
doesn’t the Wikipedia/creative commons license specify exactly that you can only redistribute under the same/similar license?
I guess the question is whether a trained model is an “adaptation” of the work according to the license. If that’s the case you’re bound to using creative commons, I think.
Best,
Nils
On 29.10.2013, at 16:54, Thomas Zastrow <po...@thomas-zastrow.de> wrote:
> Dear all,
>
> I created now a named entity model for German. It is trained on 5.000 manually annotated sentences and performs - not perfect, but its already usable. I will go on with more texts.
>
> I used only texts from Wikipedia and Wikinews, so in my eyes it shouldn't be a problem to distribute the model. But I'm not sure which license would be a good choice: OpenNLP uses the Apache license, but Wikipedia is Creative Commons. On the other hand, because I have the "raw" trained data, it would be easy to train other NE detectors with the data.
>
> The OpenNLP page doesn't say anything about the licences of the models which can be found there already.
>
> So, what do you think, would be the best license for
>
> a)
> a trained model
>
> and
>
> b)
> the raw data which is overall Wikipedia content
>
> ?
>
> Thanks in advance and best regards,
>
> Tom
>
>
> --
> Dr. Thomas Zastrow
> Riemerfeldring 7a
>
> 85748 Garching
> Tel.: 0162 422 8029
> www.thomas-zastrow.de
>
>