You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Masanz, James J." <Ma...@mayo.edu> on 2013/01/22 04:00:15 UTC

RE: [DISCUSS] What should we do with cTAKES resources?

Jörn, 

Today Benson wrote the following in this post to incubator http://s.apache.org/Gz5
"I fear that cTakes needs to have an interaction with LEGAL to adopt the SpamAssassin model, since, from a strict constructionist perspective, the source of the models is precisely what you cannot release."

Is he just unaware of some discussion you already had with LEGAL for OpenNLP - I ask because in the discussion below you indicated it would be OK to release models at Apache without releasing the data the models were built from. Is there some previous post we can point to or should I open a discussion with LEGAL about cTAKES models


-- James Masanz

> -----Original Message-----
> From: ctakes-dev-return-811-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-811-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Jörn Kottmann
> Sent: Monday, November 05, 2012 6:41 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] What should we do with cTAKES resources?
> 
> In my opinion we should release what we can from here at Apache and only
> the resources which have an incompatible license need to be handled
> differently, e.g. external site.
> 
> Models which are trained on private clinical data can be released as long
> as the original creator decides to license them under AL 2.0. If that is
> done by a committer it should be fine to just check them in or put them on
> the website.
> 
> The wikipedia license is compatible and an index of it as well, but we
> probably need to have attributio for it in a NOTICE file, and maybe
> include the license in the LICENSE file.
> 
> Jörn
> 
> On 11/02/2012 10:46 PM, Chen, Pei wrote:
> > I think we postponed this topic previously and since the ASF code seems
> to be in decent shape now, I think it's time to revisit this discussion
> for the longer term.
> > Currently, we have the below resources bundled with our source code
> > and distribution
> >
> > -          UMLS dictionaries (hsqldb format and in lucene indexes)
> >
> > -          Models (which were okay be to release opened source) that
> have been train from various clinical data
> >
> > -          Wikipedia index
> >
> > What are our options as ASF source code, binaries, models,
> > dependencies all need to be compliant with ASL 2.0
> > (http://www.apache.org/legal/3party.html)
> >
> > 1)      Leave things as they are, but we need to confirm with the
> sources and also will probably need to seek approval from Apache Legal for
> each of the resources
> >
> > 2)      Host the resources externally such as SourceForge similar to
> OpenNLP models (http://opennlp.sourceforge.net/models-1.5/)
> >
> > a.       Single zip per release for users to download?
> >
> > Option 2 seems the least painful in terms of compliance.
> > Since 3.0.0-incubating, each resource has a fully qualified name/path
> and is read from the classpath so it should be fairly easy if we decided
> to pull it in from external sources.
> >
> > --Pei
> >
> >


RE: [DISCUSS] What should we do with cTAKES resources?

Posted by "Masanz, James J." <Ma...@mayo.edu>.
Moving those to ctakes-resources on Sourceforge sounds like the way to go to me. 
I was hoping to take a stab at it tomorrow but that is looking unlikely.

I am hoping that, to keep install process for our end users relatively simple, we can still have a convenience binary with the resources (jars, models, dictionaries) except the UMLS ones (which need to be separate for licensing)
Otherwise I will be greatly concerned about the step back we are taking from an end user (non-programmer) install perspective.

- James

> -----Original Message-----
> From: ctakes-dev-return-1080-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1080-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Chen, Pei
> Sent: Tuesday, January 22, 2013 8:03 AM
> To: <ct...@incubator.apache.org>
> Cc: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] What should we do with cTAKES resources?
> 
> James,
> I was under the pretense that we could include the models, but it sounds
> like it is not the case. We can move every single bin/model to ctakes-
> resources in Source forge and do a MVN deploy to push it to maven central;
> like what we did for umls/lvg. I can take a stab at it later this week if
> no one gets to it (and if there's an agreement).
> 
> 
> Sent from my iPhone
> 
> On Jan 22, 2013, at 5:35 AM, "Jörn Kottmann" <ko...@gmail.com> wrote:
> 
> > On 01/22/2013 04:00 AM, Masanz, James J. wrote:
> >> Jörn,
> >>
> >> Today Benson wrote the following in this post to incubator
> >> http://s.apache.org/Gz5 "I fear that cTakes needs to have an
> interaction with LEGAL to adopt the SpamAssassin model, since, from a
> strict constructionist perspective, the source of the models is precisely
> what you cannot release."
> >>
> >> Is he just unaware of some discussion you already had with LEGAL for
> >> OpenNLP - I ask because in the discussion below you indicated it
> >> would be OK to release models at Apache without releasing the data
> >> the models were built from. Is there some previous post we can point
> >> to or should I open a discussion with LEGAL about cTAKES models
> >
> > I was under the assumption that it is ok the just release the model
> > and not the training data under AL 2.0 here at Apache, over at UIMA we
> > had a similar discussion for French POS Tagger (UIMA-2146). There the
> concern was that its very cumbersome to train again on the data, but not
> that it can't be released.
> >
> > To circumvent this particular issue it should be possible to release
> > the models outside of Apache and then just redistribute them as class A
> dependency in the cTAKES binary distribution.
> >
> > Jörn

RE: [DISCUSS] What should we do with cTAKES resources?

Posted by "Savova, Guergana" <Gu...@childrens.harvard.edu>.
Yes, that was my understanding as well (that we can include the models).  I am not fully understanding where the problem is.
--Guergana

-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu] 
Sent: Tuesday, January 22, 2013 9:03 AM
To: <ct...@incubator.apache.org>
Cc: ctakes-dev@incubator.apache.org
Subject: Re: [DISCUSS] What should we do with cTAKES resources?

James,
I was under the pretense that we could include the models, but it sounds like it is not the case. We can move every single bin/model to ctakes-resources in Source forge and do a MVN deploy to push it to maven central; like what we did for umls/lvg. I can take a stab at it later this week if no one gets to it (and if there's an agreement).


Sent from my iPhone

On Jan 22, 2013, at 5:35 AM, "Jörn Kottmann" <ko...@gmail.com> wrote:

> On 01/22/2013 04:00 AM, Masanz, James J. wrote:
>> Jörn,
>> 
>> Today Benson wrote the following in this post to incubator 
>> http://s.apache.org/Gz5 "I fear that cTakes needs to have an interaction with LEGAL to adopt the SpamAssassin model, since, from a strict constructionist perspective, the source of the models is precisely what you cannot release."
>> 
>> Is he just unaware of some discussion you already had with LEGAL for 
>> OpenNLP - I ask because in the discussion below you indicated it 
>> would be OK to release models at Apache without releasing the data 
>> the models were built from. Is there some previous post we can point 
>> to or should I open a discussion with LEGAL about cTAKES models
> 
> I was under the assumption that it is ok the just release the model 
> and not the training data under AL 2.0 here at Apache, over at UIMA we 
> had a similar discussion for French POS Tagger (UIMA-2146). There the concern was that its very cumbersome to train again on the data, but not that it can't be released.
> 
> To circumvent this particular issue it should be possible to release 
> the models outside of Apache and then just redistribute them as class A dependency in the cTAKES binary distribution.
> 
> Jörn

Re: [DISCUSS] What should we do with cTAKES resources?

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
James,
I was under the pretense that we could include the models, but it sounds like it is not the case. We can move every single bin/model to ctakes-resources in Source forge and do a MVN deploy to push it to maven central; like what we did for umls/lvg. I can take a stab at it later this week if no one gets to it (and if there's an agreement).


Sent from my iPhone

On Jan 22, 2013, at 5:35 AM, "Jörn Kottmann" <ko...@gmail.com> wrote:

> On 01/22/2013 04:00 AM, Masanz, James J. wrote:
>> Jörn,
>> 
>> Today Benson wrote the following in this post to incubator http://s.apache.org/Gz5
>> "I fear that cTakes needs to have an interaction with LEGAL to adopt the SpamAssassin model, since, from a strict constructionist perspective, the source of the models is precisely what you cannot release."
>> 
>> Is he just unaware of some discussion you already had with LEGAL for OpenNLP - I ask because in the discussion below you indicated it would be OK to release models at Apache without releasing the data the models were built from. Is there some previous post we can point to or should I open a discussion with LEGAL about cTAKES models
> 
> I was under the assumption that it is ok the just release the model and not the training data under AL 2.0 here at Apache,
> over at UIMA we had a similar discussion for French POS Tagger (UIMA-2146). There the concern was that its very cumbersome
> to train again on the data, but not that it can't be released.
> 
> To circumvent this particular issue it should be possible to release the models outside of Apache and then just redistribute
> them as class A dependency in the cTAKES binary distribution.
> 
> Jörn

Re: [DISCUSS] What should we do with cTAKES resources?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 01/22/2013 04:00 AM, Masanz, James J. wrote:
> Jörn,
>
> Today Benson wrote the following in this post to incubator http://s.apache.org/Gz5
> "I fear that cTakes needs to have an interaction with LEGAL to adopt the SpamAssassin model, since, from a strict constructionist perspective, the source of the models is precisely what you cannot release."
>
> Is he just unaware of some discussion you already had with LEGAL for OpenNLP - I ask because in the discussion below you indicated it would be OK to release models at Apache without releasing the data the models were built from. Is there some previous post we can point to or should I open a discussion with LEGAL about cTAKES models
>
>

I was under the assumption that it is ok the just release the model and 
not the training data under AL 2.0 here at Apache,
over at UIMA we had a similar discussion for French POS Tagger 
(UIMA-2146). There the concern was that its very cumbersome
to train again on the data, but not that it can't be released.

To circumvent this particular issue it should be possible to release the 
models outside of Apache and then just redistribute
them as class A dependency in the cTAKES binary distribution.

Jörn