You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Michael Schmitz <sc...@cs.washington.edu> on 2012/06/09 00:55:44 UTC

Host stock models in maven central

Hi, is there any interest in hosting the stock OpenNLP models in Maven
Central?  I know that OpenNLP intends for users to train models on
their particular corpus, but often it's useful to get started with the
stock models.

I'm developing a common interface to some NLP toolkits in Scala and
would like to include OpenNLP.  I would like to use OpenNLP and have
use the stock models by default as a maven dependency.  If I do this,
then I don't need to include the models with my artifact and I don't
need to keep the models in my git repository.  More importantly, users
can exclude the stock models if they wish.

What do you think?

Peace.  Michael

Re: Host stock models in maven central

Posted by Richard Eckart de Castilho <ec...@ukp.informatik.tu-darmstadt.de>.

Am 10.06.2012 um 00:44 schrieb James Kosin:
> 
> It is one of the things we are working on.  The problem is most if not
> all the models are currently trained on copyrighted material that
> restricts the usage of the resulting trained data to research purposes ONLY.
> We currently host the models on another site; due to this limitation and
> the licensing conflict that would result if we tried to host on Apache.
> 
> You are more than welcome to help, if you choose.

I'm working on the DKPro Core [1] project (UIMA-based NLP components). The project
integrates a growing number of different NLP tools into a common interoperable
framework. We've also started integrating OpenNLP now. We figured that our 
preferred API and way of using UIMA is sufficiently different from OpenNLP's
UIMA integration that we started doing our own. Well, so much for the
background.

We have a public Artifactory (Maven repository) up and running on which we
host the our Open Source stuff that we cannot put on Maven Central for one
reason or the other. We wouldn't mind hosting additional models as long as
redistribution is not explicitly prohibited.

Actually, we do already host several of the OpenNLP models [2] in that Maven
repository. We do not simply host the bin files, but wrap them up in JARs again
which makes it easier to add them as Maven dependencies and load them from the
classpath.

So if you are looking for a place to drop redistributable OpenNLP models
(research only is ok for us), feel free to drop me a note. The only thing
we ask for is some information regarding the license and redistributability,
so we can make sure redistribution is not explicitly generally prohibited.

Feel free to use the wrapped models we already have as Maven dependencies in
your own projects. The model JARs contain the bin and a bit of metadata. If
you like the wrapped models and need other models wrapped, just tell me.

-- Richard

[1] http://code.google.com/p/dkpro-core-asl/
[2] https://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/webapp/search/artifact?q=opennlp-model

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckart@ukp.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------

Re: Host stock models in maven central

Posted by James Kosin <ja...@gmail.com>.

On 8/8/2012 10:31 AM, Jason Baldridge wrote:
> Sorry if I missed something along the way -- who did the annotation of the
> Wikipedia data?
>
> BTW, the OANC will soon come out with their 3.0 release of MASC (the
> Manually Annotated Sub-Corpus), with about 800k tokens of English text
> (multiple domains, including twitter, blogs, transcribed spoken, and more)
> labeled with several different levels of analysis, including chunks (noun
> and verb), entities, tokens, POS tags, sentence boundaries, and logical
> forms.
>
> http://www.americannationalcorpus.org/MASC/Home.html
>
>
Jason,

It looks interesting; but, they only provide annotations for the data
with 80K words right now.  They have data-sets for the others only.  :-(

But, they provide a subset of 40K words in CoNNL 08 format.

With our architecture, it doesn't matter much on the format it is just
on getting a converter to extract the data we need.  Looks like we could
even train tokenizer and sentence detector on the structure provided.

Re: Host stock models in maven central

Posted by James Kosin <ja...@gmail.com>.

Jason & Jorn,

They say the support will be back up in October on the web-site.  The
CoNLL 2008 format looks promising.  But any of the others would probably
work.  They seem to have problems with the Penn Treebank format they
have several patches against that format.

James

On 8/9/2012 3:38 AM, Jörn Kottmann wrote:
> Maybe we can then even distribute these models from Apache.
> But in any case we should implement format support for the corpus,
> so that training OpenNLP on it is easy.
>
> Jörn
>
> On 08/09/2012 03:45 AM, Jason Baldridge wrote:
>> There is a link to a pre-release of the MASC data that I have but am not
>> sure I can share. I believe they are planning to have a finalized
>> version
>> out in September.
>>
>> AFAIK, the MASC data is unencumbered -- Nancy Ide is very committed to
>> having truly open data and annotations. It would be great if the
>> community
>> can give back to the OANC with further annotations, tools, and such
>> -- some
>> of the annotation stuff being discussed here would could be great for
>> this.
>>
>> On Wed, Aug 8, 2012 at 7:47 PM, James Kosin <ja...@gmail.com>
>> wrote:
>>
>>> http://www.anc.org/
>>>
>>> ... but, this suggests the data they collect is only for research and
>>> education.
>>>
>>> On 8/8/2012 10:31 AM, Jason Baldridge wrote:
>>>> Sorry if I missed something along the way -- who did the annotation of
>>> the
>>>> Wikipedia data?
>>>>
>>>> BTW, the OANC will soon come out with their 3.0 release of MASC (the
>>>> Manually Annotated Sub-Corpus), with about 800k tokens of English text
>>>> (multiple domains, including twitter, blogs, transcribed spoken, and
>>> more)
>>>> labeled with several different levels of analysis, including chunks
>>>> (noun
>>>> and verb), entities, tokens, POS tags, sentence boundaries, and
>>>> logical
>>>> forms.
>>>>
>>>> http://www.americannationalcorpus.org/MASC/Home.html
>>>>
>>>> On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <ko...@gmail.com>
>>> wrote:
>>>>> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
>>>>>
>>>>>> Hi, here are some models trained on Wikipedia data.  They have
>>>>>> similar
>>>>>> performance.  Is this useful?
>>>>>>
>>>>> Yes, people who do not have access to our MUC based training
>>>>> data can just use the wiki data instead and combine it with their
>>>>> data.
>>>>>
>>>>> Thanks for sharing.
>>>>>
>>>>> Now all we need is a way to get label corrections from the
>>>>> community :-)
>>>>>
>>>>> Jörn
>>>>>
>>>>
>>>
>>
>

Re: Host stock models in maven central

Posted by Jörn Kottmann <ko...@gmail.com>.

Maybe we can then even distribute these models from Apache.
But in any case we should implement format support for the corpus,
so that training OpenNLP on it is easy.

Jörn

On 08/09/2012 03:45 AM, Jason Baldridge wrote:
> There is a link to a pre-release of the MASC data that I have but am not
> sure I can share. I believe they are planning to have a finalized version
> out in September.
>
> AFAIK, the MASC data is unencumbered -- Nancy Ide is very committed to
> having truly open data and annotations. It would be great if the community
> can give back to the OANC with further annotations, tools, and such -- some
> of the annotation stuff being discussed here would could be great for this.
>
> On Wed, Aug 8, 2012 at 7:47 PM, James Kosin <ja...@gmail.com> wrote:
>
>> http://www.anc.org/
>>
>> ... but, this suggests the data they collect is only for research and
>> education.
>>
>> On 8/8/2012 10:31 AM, Jason Baldridge wrote:
>>> Sorry if I missed something along the way -- who did the annotation of
>> the
>>> Wikipedia data?
>>>
>>> BTW, the OANC will soon come out with their 3.0 release of MASC (the
>>> Manually Annotated Sub-Corpus), with about 800k tokens of English text
>>> (multiple domains, including twitter, blogs, transcribed spoken, and
>> more)
>>> labeled with several different levels of analysis, including chunks (noun
>>> and verb), entities, tokens, POS tags, sentence boundaries, and logical
>>> forms.
>>>
>>> http://www.americannationalcorpus.org/MASC/Home.html
>>>
>>> On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <ko...@gmail.com>
>> wrote:
>>>> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
>>>>
>>>>> Hi, here are some models trained on Wikipedia data.  They have similar
>>>>> performance.  Is this useful?
>>>>>
>>>> Yes, people who do not have access to our MUC based training
>>>> data can just use the wiki data instead and combine it with their data.
>>>>
>>>> Thanks for sharing.
>>>>
>>>> Now all we need is a way to get label corrections from the community :-)
>>>>
>>>> Jörn
>>>>
>>>
>>
>

Re: Host stock models in maven central

Posted by Jason Baldridge <ja...@gmail.com>.

There is a link to a pre-release of the MASC data that I have but am not
sure I can share. I believe they are planning to have a finalized version
out in September.

AFAIK, the MASC data is unencumbered -- Nancy Ide is very committed to
having truly open data and annotations. It would be great if the community
can give back to the OANC with further annotations, tools, and such -- some
of the annotation stuff being discussed here would could be great for this.

On Wed, Aug 8, 2012 at 7:47 PM, James Kosin <ja...@gmail.com> wrote:

>
> http://www.anc.org/
>
> ... but, this suggests the data they collect is only for research and
> education.
>
> On 8/8/2012 10:31 AM, Jason Baldridge wrote:
> > Sorry if I missed something along the way -- who did the annotation of
> the
> > Wikipedia data?
> >
> > BTW, the OANC will soon come out with their 3.0 release of MASC (the
> > Manually Annotated Sub-Corpus), with about 800k tokens of English text
> > (multiple domains, including twitter, blogs, transcribed spoken, and
> more)
> > labeled with several different levels of analysis, including chunks (noun
> > and verb), entities, tokens, POS tags, sentence boundaries, and logical
> > forms.
> >
> > http://www.americannationalcorpus.org/MASC/Home.html
> >
> > On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <ko...@gmail.com>
> wrote:
> >
> >> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
> >>
> >>> Hi, here are some models trained on Wikipedia data.  They have similar
> >>> performance.  Is this useful?
> >>>
> >> Yes, people who do not have access to our MUC based training
> >> data can just use the wiki data instead and combine it with their data.
> >>
> >> Thanks for sharing.
> >>
> >> Now all we need is a way to get label corrections from the community :-)
> >>
> >> Jörn
> >>
> >
> >
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Host stock models in maven central

Posted by James Kosin <ja...@gmail.com>.

http://www.anc.org/

... but, this suggests the data they collect is only for research and
education.

On 8/8/2012 10:31 AM, Jason Baldridge wrote:
> Sorry if I missed something along the way -- who did the annotation of the
> Wikipedia data?
>
> BTW, the OANC will soon come out with their 3.0 release of MASC (the
> Manually Annotated Sub-Corpus), with about 800k tokens of English text
> (multiple domains, including twitter, blogs, transcribed spoken, and more)
> labeled with several different levels of analysis, including chunks (noun
> and verb), entities, tokens, POS tags, sentence boundaries, and logical
> forms.
>
> http://www.americannationalcorpus.org/MASC/Home.html
>
> On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
>>
>>> Hi, here are some models trained on Wikipedia data.  They have similar
>>> performance.  Is this useful?
>>>
>> Yes, people who do not have access to our MUC based training
>> data can just use the wiki data instead and combine it with their data.
>>
>> Thanks for sharing.
>>
>> Now all we need is a way to get label corrections from the community :-)
>>
>> Jörn
>>
>
>

Re: Host stock models in maven central

Posted by Jason Baldridge <ja...@gmail.com>.

Sorry if I missed something along the way -- who did the annotation of the
Wikipedia data?

BTW, the OANC will soon come out with their 3.0 release of MASC (the
Manually Annotated Sub-Corpus), with about 800k tokens of English text
(multiple domains, including twitter, blogs, transcribed spoken, and more)
labeled with several different levels of analysis, including chunks (noun
and verb), entities, tokens, POS tags, sentence boundaries, and logical
forms.

http://www.americannationalcorpus.org/MASC/Home.html

On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
>
>> Hi, here are some models trained on Wikipedia data.  They have similar
>> performance.  Is this useful?
>>
>
> Yes, people who do not have access to our MUC based training
> data can just use the wiki data instead and combine it with their data.
>
> Thanks for sharing.
>
> Now all we need is a way to get label corrections from the community :-)
>
> Jörn
>

-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Host stock models in maven central

Posted by Jörn Kottmann <ko...@gmail.com>.

On 08/08/2012 06:16 AM, Michael Schmitz wrote:
> Hi, here are some models trained on Wikipedia data.  They have similar
> performance.  Is this useful?

Yes, people who do not have access to our MUC based training
data can just use the wiki data instead and combine it with their data.

Thanks for sharing.

Now all we need is a way to get label corrections from the community :-)

Jörn

Re: Host stock models in maven central

Posted by Michael Schmitz <sc...@cs.washington.edu>.

Hi, here are some models trained on Wikipedia data.  They have similar
performance.  Is this useful?

https://gist.github.com/3291931

Peace.  Michael


On Fri, Jun 29, 2012 at 7:43 PM, Michael Schmitz
<sc...@cs.washington.edu>wrote:

> Well, if I find time, I'll run the models on an Apache-license dataset
> and then train new models using the output.  I'm sure this would be
> safe from licensing issues and if we had any time, we could clean up
> the annotations.
>
> Peace.  Michael
>
>
> On Sun, Jun 24, 2012 at 6:52 PM, Benson Margulies <bi...@gmail.com>
> wrote:
> > On Sun, Jun 24, 2012 at 9:48 PM, James Kosin <ja...@gmail.com>
> wrote:
> >> Hi Michael,
> >>
> >> Sorry about the late response to this.
> >>
> >> Yes, it is however they also restrict the distribution of the models as
> >> well... I've already asked.  The license allows us to use for research
> >> purposes only and we are not allowed to redistribute the models.  I've
> >> already asked this to the person in charge of distributing the corpus.
> >>
> >> None of OpenNLP's models are based on this corpus as far as I know.  All
> >> the models are produced from different copyrights and limitations.
> >> Apache license however, doesn't allow for binary only distribution with
> >> no way of producing or reproducing from our own sources that must be
> >> licensed under the Apache license.  The best way we can do right now is
> >> to distribute the sources and binaries for the java classes and work on
> >> producing a corpus of our own from non-copyrighted text and distributed
> >> those sources and models in Apache under the licensing from Apache.
> >
> > Also note that nothing stops someone else from distributing binary
> > models outside of Apache. Anyone who wanted to pick up the corpora and
> > reach their own conclusion about the legitimacy of open distribution
> > of binary models could build these models and distribute them via
> > OSSRH to maven central. Just so long as they respect ASF trademark
> > policies in describing the models as, oh, 'useful with the Apache
> > OpenNLP software library'.
> >
> >
> >
> >>
> >> James
> >>
> >> On 6/12/2012 12:37 PM, Michael Schmitz wrote:
> >>> Hi James, is this the contract?
> >>>
> >>> http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html
> >>>
> >>> If so, I think you are free to license your derived models however you
> >>> please although you may not redistribute the training data.
> >>>
> >>> What models does the Reuters contract apply to?
> >>>
> >>> Peace.  Michael
> >>>
> >>>
> >>> On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <ja...@gmail.com>
> wrote:
> >>>> Michael,
> >>>>
> >>>> I only have the contract for the Reuters corpus I use and it
> >>>> specifically prohibits use for anything other than educational or
> >>>> research wise.  Commercial applications violate the copyright and
> >>>> contract terms.  I'm sure many of the others are similar.  This
> includes
> >>>> any trained models.
> >>>>
> >>>> James
> >>>>
> >>>> On 6/11/2012 1:45 PM, Michael Schmitz wrote:
> >>>>> Are you sure the copyright applies to your trained model?  Do you
> have
> >>>>> any information about the corpuses you used to train the models?
> >>>>>
> >>>>> Peace.  Michael
> >>>>>
> >>>>>
> >>>>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com>
> wrote:
> >>>>>> Michael,
> >>>>>>
> >>>>>> It is one of the things we are working on.  The problem is most if
> not
> >>>>>> all the models are currently trained on copyrighted material that
> >>>>>> restricts the usage of the resulting trained data to research
> purposes ONLY.
> >>>>>> We currently host the models on another site; due to this
> limitation and
> >>>>>> the licensing conflict that would result if we tried to host on
> Apache.
> >>>>>>
> >>>>>> You are more than welcome to help, if you choose.
> >>>>>>
> >>>>>> James
> >>>>>>
> >>>>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
> >>>>>>> Hi, is there any interest in hosting the stock OpenNLP models in
> Maven
> >>>>>>> Central?  I know that OpenNLP intends for users to train models on
> >>>>>>> their particular corpus, but often it's useful to get started with
> the
> >>>>>>> stock models.
> >>>>>>>
> >>>>>>> I'm developing a common interface to some NLP toolkits in Scala and
> >>>>>>> would like to include OpenNLP.  I would like to use OpenNLP and
> have
> >>>>>>> use the stock models by default as a maven dependency.  If I do
> this,
> >>>>>>> then I don't need to include the models with my artifact and I
> don't
> >>>>>>> need to keep the models in my git repository.  More importantly,
> users
> >>>>>>> can exclude the stock models if they wish.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Peace.  Michael
> >>>>
> >>
> >>
>

Re: Host stock models in maven central

Posted by Michael Schmitz <sc...@cs.washington.edu>.

Well, if I find time, I'll run the models on an Apache-license dataset
and then train new models using the output.  I'm sure this would be
safe from licensing issues and if we had any time, we could clean up
the annotations.

Peace.  Michael


On Sun, Jun 24, 2012 at 6:52 PM, Benson Margulies <bi...@gmail.com> wrote:
> On Sun, Jun 24, 2012 at 9:48 PM, James Kosin <ja...@gmail.com> wrote:
>> Hi Michael,
>>
>> Sorry about the late response to this.
>>
>> Yes, it is however they also restrict the distribution of the models as
>> well... I've already asked.  The license allows us to use for research
>> purposes only and we are not allowed to redistribute the models.  I've
>> already asked this to the person in charge of distributing the corpus.
>>
>> None of OpenNLP's models are based on this corpus as far as I know.  All
>> the models are produced from different copyrights and limitations.
>> Apache license however, doesn't allow for binary only distribution with
>> no way of producing or reproducing from our own sources that must be
>> licensed under the Apache license.  The best way we can do right now is
>> to distribute the sources and binaries for the java classes and work on
>> producing a corpus of our own from non-copyrighted text and distributed
>> those sources and models in Apache under the licensing from Apache.
>
> Also note that nothing stops someone else from distributing binary
> models outside of Apache. Anyone who wanted to pick up the corpora and
> reach their own conclusion about the legitimacy of open distribution
> of binary models could build these models and distribute them via
> OSSRH to maven central. Just so long as they respect ASF trademark
> policies in describing the models as, oh, 'useful with the Apache
> OpenNLP software library'.
>
>
>
>>
>> James
>>
>> On 6/12/2012 12:37 PM, Michael Schmitz wrote:
>>> Hi James, is this the contract?
>>>
>>> http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html
>>>
>>> If so, I think you are free to license your derived models however you
>>> please although you may not redistribute the training data.
>>>
>>> What models does the Reuters contract apply to?
>>>
>>> Peace.  Michael
>>>
>>>
>>> On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <ja...@gmail.com> wrote:
>>>> Michael,
>>>>
>>>> I only have the contract for the Reuters corpus I use and it
>>>> specifically prohibits use for anything other than educational or
>>>> research wise.  Commercial applications violate the copyright and
>>>> contract terms.  I'm sure many of the others are similar.  This includes
>>>> any trained models.
>>>>
>>>> James
>>>>
>>>> On 6/11/2012 1:45 PM, Michael Schmitz wrote:
>>>>> Are you sure the copyright applies to your trained model?  Do you have
>>>>> any information about the corpuses you used to train the models?
>>>>>
>>>>> Peace.  Michael
>>>>>
>>>>>
>>>>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com> wrote:
>>>>>> Michael,
>>>>>>
>>>>>> It is one of the things we are working on.  The problem is most if not
>>>>>> all the models are currently trained on copyrighted material that
>>>>>> restricts the usage of the resulting trained data to research purposes ONLY.
>>>>>> We currently host the models on another site; due to this limitation and
>>>>>> the licensing conflict that would result if we tried to host on Apache.
>>>>>>
>>>>>> You are more than welcome to help, if you choose.
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
>>>>>>> Hi, is there any interest in hosting the stock OpenNLP models in Maven
>>>>>>> Central?  I know that OpenNLP intends for users to train models on
>>>>>>> their particular corpus, but often it's useful to get started with the
>>>>>>> stock models.
>>>>>>>
>>>>>>> I'm developing a common interface to some NLP toolkits in Scala and
>>>>>>> would like to include OpenNLP.  I would like to use OpenNLP and have
>>>>>>> use the stock models by default as a maven dependency.  If I do this,
>>>>>>> then I don't need to include the models with my artifact and I don't
>>>>>>> need to keep the models in my git repository.  More importantly, users
>>>>>>> can exclude the stock models if they wish.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Peace.  Michael
>>>>
>>
>>

Re: Host stock models in maven central

Posted by Benson Margulies <bi...@gmail.com>.

On Sun, Jun 24, 2012 at 9:48 PM, James Kosin <ja...@gmail.com> wrote:
> Hi Michael,
>
> Sorry about the late response to this.
>
> Yes, it is however they also restrict the distribution of the models as
> well... I've already asked.  The license allows us to use for research
> purposes only and we are not allowed to redistribute the models.  I've
> already asked this to the person in charge of distributing the corpus.
>
> None of OpenNLP's models are based on this corpus as far as I know.  All
> the models are produced from different copyrights and limitations.
> Apache license however, doesn't allow for binary only distribution with
> no way of producing or reproducing from our own sources that must be
> licensed under the Apache license.  The best way we can do right now is
> to distribute the sources and binaries for the java classes and work on
> producing a corpus of our own from non-copyrighted text and distributed
> those sources and models in Apache under the licensing from Apache.

Also note that nothing stops someone else from distributing binary
models outside of Apache. Anyone who wanted to pick up the corpora and
reach their own conclusion about the legitimacy of open distribution
of binary models could build these models and distribute them via
OSSRH to maven central. Just so long as they respect ASF trademark
policies in describing the models as, oh, 'useful with the Apache
OpenNLP software library'.



>
> James
>
> On 6/12/2012 12:37 PM, Michael Schmitz wrote:
>> Hi James, is this the contract?
>>
>> http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html
>>
>> If so, I think you are free to license your derived models however you
>> please although you may not redistribute the training data.
>>
>> What models does the Reuters contract apply to?
>>
>> Peace.  Michael
>>
>>
>> On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <ja...@gmail.com> wrote:
>>> Michael,
>>>
>>> I only have the contract for the Reuters corpus I use and it
>>> specifically prohibits use for anything other than educational or
>>> research wise.  Commercial applications violate the copyright and
>>> contract terms.  I'm sure many of the others are similar.  This includes
>>> any trained models.
>>>
>>> James
>>>
>>> On 6/11/2012 1:45 PM, Michael Schmitz wrote:
>>>> Are you sure the copyright applies to your trained model?  Do you have
>>>> any information about the corpuses you used to train the models?
>>>>
>>>> Peace.  Michael
>>>>
>>>>
>>>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com> wrote:
>>>>> Michael,
>>>>>
>>>>> It is one of the things we are working on.  The problem is most if not
>>>>> all the models are currently trained on copyrighted material that
>>>>> restricts the usage of the resulting trained data to research purposes ONLY.
>>>>> We currently host the models on another site; due to this limitation and
>>>>> the licensing conflict that would result if we tried to host on Apache.
>>>>>
>>>>> You are more than welcome to help, if you choose.
>>>>>
>>>>> James
>>>>>
>>>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
>>>>>> Hi, is there any interest in hosting the stock OpenNLP models in Maven
>>>>>> Central?  I know that OpenNLP intends for users to train models on
>>>>>> their particular corpus, but often it's useful to get started with the
>>>>>> stock models.
>>>>>>
>>>>>> I'm developing a common interface to some NLP toolkits in Scala and
>>>>>> would like to include OpenNLP.  I would like to use OpenNLP and have
>>>>>> use the stock models by default as a maven dependency.  If I do this,
>>>>>> then I don't need to include the models with my artifact and I don't
>>>>>> need to keep the models in my git repository.  More importantly, users
>>>>>> can exclude the stock models if they wish.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Peace.  Michael
>>>
>
>

Re: Host stock models in maven central

Posted by James Kosin <ja...@gmail.com>.

Hi Michael,

Sorry about the late response to this.

Yes, it is however they also restrict the distribution of the models as
well... I've already asked.  The license allows us to use for research
purposes only and we are not allowed to redistribute the models.  I've
already asked this to the person in charge of distributing the corpus.

None of OpenNLP's models are based on this corpus as far as I know.  All
the models are produced from different copyrights and limitations. 
Apache license however, doesn't allow for binary only distribution with
no way of producing or reproducing from our own sources that must be
licensed under the Apache license.  The best way we can do right now is
to distribute the sources and binaries for the java classes and work on
producing a corpus of our own from non-copyrighted text and distributed
those sources and models in Apache under the licensing from Apache.

James

On 6/12/2012 12:37 PM, Michael Schmitz wrote:
> Hi James, is this the contract?
>
> http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html
>
> If so, I think you are free to license your derived models however you
> please although you may not redistribute the training data.
>
> What models does the Reuters contract apply to?
>
> Peace.  Michael
>
>
> On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <ja...@gmail.com> wrote:
>> Michael,
>>
>> I only have the contract for the Reuters corpus I use and it
>> specifically prohibits use for anything other than educational or
>> research wise.  Commercial applications violate the copyright and
>> contract terms.  I'm sure many of the others are similar.  This includes
>> any trained models.
>>
>> James
>>
>> On 6/11/2012 1:45 PM, Michael Schmitz wrote:
>>> Are you sure the copyright applies to your trained model?  Do you have
>>> any information about the corpuses you used to train the models?
>>>
>>> Peace.  Michael
>>>
>>>
>>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com> wrote:
>>>> Michael,
>>>>
>>>> It is one of the things we are working on.  The problem is most if not
>>>> all the models are currently trained on copyrighted material that
>>>> restricts the usage of the resulting trained data to research purposes ONLY.
>>>> We currently host the models on another site; due to this limitation and
>>>> the licensing conflict that would result if we tried to host on Apache.
>>>>
>>>> You are more than welcome to help, if you choose.
>>>>
>>>> James
>>>>
>>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
>>>>> Hi, is there any interest in hosting the stock OpenNLP models in Maven
>>>>> Central?  I know that OpenNLP intends for users to train models on
>>>>> their particular corpus, but often it's useful to get started with the
>>>>> stock models.
>>>>>
>>>>> I'm developing a common interface to some NLP toolkits in Scala and
>>>>> would like to include OpenNLP.  I would like to use OpenNLP and have
>>>>> use the stock models by default as a maven dependency.  If I do this,
>>>>> then I don't need to include the models with my artifact and I don't
>>>>> need to keep the models in my git repository.  More importantly, users
>>>>> can exclude the stock models if they wish.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Peace.  Michael
>>

Re: Host stock models in maven central

Posted by Michael Schmitz <sc...@cs.washington.edu>.

Hi James, is this the contract?

http://trec.nist.gov/data/reuters/org_appl_reuters_v4.html

If so, I think you are free to license your derived models however you
please although you may not redistribute the training data.

What models does the Reuters contract apply to?

Peace.  Michael


On Mon, Jun 11, 2012 at 7:23 PM, James Kosin <ja...@gmail.com> wrote:
> Michael,
>
> I only have the contract for the Reuters corpus I use and it
> specifically prohibits use for anything other than educational or
> research wise.  Commercial applications violate the copyright and
> contract terms.  I'm sure many of the others are similar.  This includes
> any trained models.
>
> James
>
> On 6/11/2012 1:45 PM, Michael Schmitz wrote:
>> Are you sure the copyright applies to your trained model?  Do you have
>> any information about the corpuses you used to train the models?
>>
>> Peace.  Michael
>>
>>
>> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com> wrote:
>>> Michael,
>>>
>>> It is one of the things we are working on.  The problem is most if not
>>> all the models are currently trained on copyrighted material that
>>> restricts the usage of the resulting trained data to research purposes ONLY.
>>> We currently host the models on another site; due to this limitation and
>>> the licensing conflict that would result if we tried to host on Apache.
>>>
>>> You are more than welcome to help, if you choose.
>>>
>>> James
>>>
>>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
>>>> Hi, is there any interest in hosting the stock OpenNLP models in Maven
>>>> Central?  I know that OpenNLP intends for users to train models on
>>>> their particular corpus, but often it's useful to get started with the
>>>> stock models.
>>>>
>>>> I'm developing a common interface to some NLP toolkits in Scala and
>>>> would like to include OpenNLP.  I would like to use OpenNLP and have
>>>> use the stock models by default as a maven dependency.  If I do this,
>>>> then I don't need to include the models with my artifact and I don't
>>>> need to keep the models in my git repository.  More importantly, users
>>>> can exclude the stock models if they wish.
>>>>
>>>> What do you think?
>>>>
>>>> Peace.  Michael
>>>
>
>

Re: Host stock models in maven central

Posted by James Kosin <ja...@gmail.com>.

Michael,

I only have the contract for the Reuters corpus I use and it
specifically prohibits use for anything other than educational or
research wise.  Commercial applications violate the copyright and
contract terms.  I'm sure many of the others are similar.  This includes
any trained models.

James

On 6/11/2012 1:45 PM, Michael Schmitz wrote:
> Are you sure the copyright applies to your trained model?  Do you have
> any information about the corpuses you used to train the models?
>
> Peace.  Michael
>
>
> On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com> wrote:
>> Michael,
>>
>> It is one of the things we are working on.  The problem is most if not
>> all the models are currently trained on copyrighted material that
>> restricts the usage of the resulting trained data to research purposes ONLY.
>> We currently host the models on another site; due to this limitation and
>> the licensing conflict that would result if we tried to host on Apache.
>>
>> You are more than welcome to help, if you choose.
>>
>> James
>>
>> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
>>> Hi, is there any interest in hosting the stock OpenNLP models in Maven
>>> Central?  I know that OpenNLP intends for users to train models on
>>> their particular corpus, but often it's useful to get started with the
>>> stock models.
>>>
>>> I'm developing a common interface to some NLP toolkits in Scala and
>>> would like to include OpenNLP.  I would like to use OpenNLP and have
>>> use the stock models by default as a maven dependency.  If I do this,
>>> then I don't need to include the models with my artifact and I don't
>>> need to keep the models in my git repository.  More importantly, users
>>> can exclude the stock models if they wish.
>>>
>>> What do you think?
>>>
>>> Peace.  Michael
>>

Re: Host stock models in maven central

Posted by Michael Schmitz <sc...@cs.washington.edu>.

Are you sure the copyright applies to your trained model?  Do you have
any information about the corpuses you used to train the models?

Peace.  Michael


On Sat, Jun 9, 2012 at 3:44 PM, James Kosin <ja...@gmail.com> wrote:
> Michael,
>
> It is one of the things we are working on.  The problem is most if not
> all the models are currently trained on copyrighted material that
> restricts the usage of the resulting trained data to research purposes ONLY.
> We currently host the models on another site; due to this limitation and
> the licensing conflict that would result if we tried to host on Apache.
>
> You are more than welcome to help, if you choose.
>
> James
>
> On 6/8/2012 6:55 PM, Michael Schmitz wrote:
>> Hi, is there any interest in hosting the stock OpenNLP models in Maven
>> Central?  I know that OpenNLP intends for users to train models on
>> their particular corpus, but often it's useful to get started with the
>> stock models.
>>
>> I'm developing a common interface to some NLP toolkits in Scala and
>> would like to include OpenNLP.  I would like to use OpenNLP and have
>> use the stock models by default as a maven dependency.  If I do this,
>> then I don't need to include the models with my artifact and I don't
>> need to keep the models in my git repository.  More importantly, users
>> can exclude the stock models if they wish.
>>
>> What do you think?
>>
>> Peace.  Michael
>
>

Re: Host stock models in maven central

Posted by James Kosin <ja...@gmail.com>.

Michael,

It is one of the things we are working on.  The problem is most if not
all the models are currently trained on copyrighted material that
restricts the usage of the resulting trained data to research purposes ONLY.
We currently host the models on another site; due to this limitation and
the licensing conflict that would result if we tried to host on Apache.

You are more than welcome to help, if you choose.

James

On 6/8/2012 6:55 PM, Michael Schmitz wrote:
> Hi, is there any interest in hosting the stock OpenNLP models in Maven
> Central?  I know that OpenNLP intends for users to train models on
> their particular corpus, but often it's useful to get started with the
> stock models.
>
> I'm developing a common interface to some NLP toolkits in Scala and
> would like to include OpenNLP.  I would like to use OpenNLP and have
> use the stock models by default as a maven dependency.  If I do this,
> then I don't need to include the models with my artifact and I don't
> need to keep the models in my git repository.  More importantly, users
> can exclude the stock models if they wish.
>
> What do you think?
>
> Peace.  Michael