You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Chen, Pei" <Pe...@childrens.harvard.edu> on 2013/03/15 16:39:12 UTC

[DISCUSS] Where should cTAKES models live?

Just to recap:
We dodged this issue in the 3.0 release by removing the models in src and bin from the ASF distributions (I think this is something that is better answered by the cTAKES community as a TLP rather than the IPMC). This came about because there were concerns raised by the Incubator community so I decided to remove the models from the 3.0 branch, but I left trunk as-is pending this discussion just to move things forward.

As our mentors pointed out in the discussion [1], ASF is not a top down organization and we will most likely not get any official answer from anyone "above".  Once cTAKES is a TLP, I believe it would be up to the cTAKES community as  "*individuals* as part of a PMC (who are the ones that *release* the software) are, and are not able to do, and what risk they are and are not able to (implied) take on, and perform, within their community".

So the question is: What should we do with the model files?  Some options include:

1)      Leave them in SourceForge/Maven Central.  Maven can download and include them in the convenience binaries in the ctakes-distribution project. Something we did quickly for 3.0, but needs to be improved if we go with this approach.  For example: [2]

2)      Leave them in the ASF repo, but separate modules/projects.

3)      Keep them in the same respective ASF modules under /src/main/resources

I think it's nice to keep these fairly large (~1GB) and static resource files separate from the source code (Either option 1 or 2).  Also, option 1 will require a little more work by the committers/release managers but will definitely avoid any licensing issues/concerns.

[1] http://markmail.org/search/+list:org.apache.incubator.general#query:list%3Aorg.apache.incubator.general%20from%3A%22Mattmann%2C%20Chris%20A%20(388J)%22+page:6+mid:nzquchvuvgkije3n+state:results
[2] https://oss.sonatype.org/index.html#nexus-search;quick~ctakesresources :
<dependency>
  <groupId>net.sourceforge.ctakesresources</groupId>
  <artifactId>ctakes-resources-assertion</artifactId>
  <version>3.1.1-SNAPSHOT</version>
</dependency>
<dependency>
  <groupId>net.sourceforge.ctakesresources</groupId>
  <artifactId>ctakes-resources-core</artifactId>
  <version>3.1.1-SNAPSHOT</version>
</dependency>

Re: [DISCUSS] Where should cTAKES models live?

Posted by Andy McMurry <mc...@gmail.com>.
+1 vote for Option #2
 "models in ASF repo in a separate module" 


On Mar 15, 2013, at 8:39 AM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:

> Just to recap:
> We dodged this issue in the 3.0 release by removing the models in src and bin from the ASF distributions (I think this is something that is better answered by the cTAKES community as a TLP rather than the IPMC). This came about because there were concerns raised by the Incubator community so I decided to remove the models from the 3.0 branch, but I left trunk as-is pending this discussion just to move things forward.
> 
> As our mentors pointed out in the discussion [1], ASF is not a top down organization and we will most likely not get any official answer from anyone "above".  Once cTAKES is a TLP, I believe it would be up to the cTAKES community as  "*individuals* as part of a PMC (who are the ones that *release* the software) are, and are not able to do, and what risk they are and are not able to (implied) take on, and perform, within their community".
> 
> So the question is: What should we do with the model files?  Some options include:
> 
> 1)      Leave them in SourceForge/Maven Central.  Maven can download and include them in the convenience binaries in the ctakes-distribution project. Something we did quickly for 3.0, but needs to be improved if we go with this approach.  For example: [2]
> 
> 2)      Leave them in the ASF repo, but separate modules/projects.
> 
> 3)      Keep them in the same respective ASF modules under /src/main/resources
> 
> I think it's nice to keep these fairly large (~1GB) and static resource files separate from the source code (Either option 1 or 2).  Also, option 1 will require a little more work by the committers/release managers but will definitely avoid any licensing issues/concerns.
> 
> [1] http://markmail.org/search/+list:org.apache.incubator.general#query:list%3Aorg.apache.incubator.general%20from%3A%22Mattmann%2C%20Chris%20A%20(388J)%22+page:6+mid:nzquchvuvgkije3n+state:results
> [2] https://oss.sonatype.org/index.html#nexus-search;quick~ctakesresources :
> <dependency>
>  <groupId>net.sourceforge.ctakesresources</groupId>
>  <artifactId>ctakes-resources-assertion</artifactId>
>  <version>3.1.1-SNAPSHOT</version>
> </dependency>
> <dependency>
>  <groupId>net.sourceforge.ctakesresources</groupId>
>  <artifactId>ctakes-resources-core</artifactId>
>  <version>3.1.1-SNAPSHOT</version>
> </dependency>


RE: [DISCUSS] Where should cTAKES models live?

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
This has been done in trunk in  r1463641.
Feel free to give it a whirl...

> -----Original Message-----
> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> Sarma
> Sent: Wednesday, April 03, 2013 7:54 AM
> To: dev@ctakes.apache.org
> Subject: Re: [DISCUSS] Where should cTAKES models live?
> 
> I like b as well
> 
> 
> 
> 
> 
> --
> Karthik Sarma
> UCLA Medical Scientist Training Program Class of 20??
> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> to the House of Delegates of the American Medical Association
> ksarma@ksarma.com
> gchat: ksarma@gmail.com
> linkedin: www.linkedin.com/in/ksarma
> 
> 
> On Fri, Mar 29, 2013 at 8:58 AM, Masanz, James J.
> <Ma...@mayo.edu>wrote:
> 
> > I agree with about (b)
> >
> >
> > > -----Original Message-----
> > > From: dev-return-1411-Masanz.James=mayo.edu@ctakes.apache.org
> [mailto:
> > dev-
> > > return-1411-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of
> > > Steven Bethard
> > > Sent: Friday, March 29, 2013 8:27 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: [DISCUSS] Where should cTAKES models live?
> > >
> > > On Mar 29, 2013, at 7:09 AM, "Chen, Pei"
> > > <Pei.Chen@childrens.harvard.edu
> > >
> > > wrote:
> > > > It looks like the general consensus is for # 2)  Leave them in the
> > > > ASF
> > > repo, but as separate modules/project(s).
> > > > Which means we (the community) will take on the risk (security,
> > > > ip,
> > > license, etc.) and responsibility for the models that we commit.
> > > > I'll take a stab at this today...
> > > > Does anyone think it's worthwhile to (a) lump them all together
> > > > and
> > call
> > > it a ctakes-resources project/model for pragmatic reasons?
> > > Otherwise
> > (b),
> > > we'll have a resource module for each such as ctakes-core-res,
> > ctakes-pos-
> > > tagger-res, etc.?
> > >
> > > I prefer (b). I know that means a lot more projects, but if I only
> > > want to, say, run the ctakes-temporal models, it would be a pity if
> > > I had to pull in the whole UMLS distribution at the same time.
> > >
> > > Steve
> > >
> > > >
> > > > --Pei
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of
> > > >> Karthik Sarma
> > > >> Sent: Tuesday, March 19, 2013 1:35 PM
> > > >> To: cTAKES Developer List
> > > >> Subject: Re: [DISCUSS] Where should cTAKES models live?
> > > >>
> > > >> I concur. +1 for option 2 -- I do not really see any advantages
> > > >> that option
> > > >> 3 could have over option 2, as the difference should be largely
> > > >> transparent to users (and even developers)
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Karthik Sarma
> > > >> UCLA Medical Scientist Training Program Class of 20??
> > > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA
> > > >> Delegation to the House of Delegates of the American Medical
> > > >> Association ksarma@ksarma.com
> > > >> gchat: ksarma@gmail.com
> > > >> linkedin: www.linkedin.com/in/ksarma
> > > >>
> > > >>
> > > >> On Tue, Mar 19, 2013 at 9:49 AM, Masanz, James J.
> > > >> <Ma...@mayo.edu>wrote:
> > > >>
> > > >>> I also am +1 for option 2.
> > > >>>
> > > >>> #3 is my least favorite, because of the download time for some
> > > >>> of the models, both for cases like Steve mentioned but also for
> > > >>> cases of wanting to check out a fresh copy of the code and not
> > > >>> wanting to wait to check out the models again
> > > >>>
> > > >>> -- James
> > > >>>
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From:
> > > >>>> ctakes-dev-return-1378-
> > > >> Masanz.James=mayo.edu@incubator.apache.org
> > > >>>> [mailto:ctakes-dev-return-1378-Masanz.James=
> > > >>> mayo.edu@incubator.apache.org]
> > > >>>> On Behalf Of Steven Bethard
> > > >>>> Sent: Friday, March 15, 2013 1:06 PM
> > > >>>> To: ctakes-dev@incubator.apache.org
> > > >>>> Subject: Re: [DISCUSS] Where should cTAKES models live?
> > > >>>>
> > > >>>> On Mar 15, 2013, at 4:39 PM, "Chen, Pei"
> > > >>>> <Pei.Chen@childrens.harvard.edu
> > > >>>>
> > > >>>> wrote:
> > > >>>>> So the question is: What should we do with the model files?
> > > >>>>> Some
> > > >>>> options include:
> > > >>>>>
> > > >>>>> 1)      Leave them in SourceForge/Maven Central.  Maven can
> > download
> > > >>> and
> > > >>>> include them in the convenience binaries in the
> > > >>>> ctakes-distribution project. Something we did quickly for 3.0,
> > > >>>> but needs to be improved if we go with this approach.  For
> > > >>>> example: [2]
> > > >>>>>
> > > >>>>> 2)      Leave them in the ASF repo, but separate modules/projects.
> > > >>>>>
> > > >>>>> 3)      Keep them in the same respective ASF modules under
> > > >>>> /src/main/resources
> > > >>>>>
> > > >>>>> I think it's nice to keep these fairly large (~1GB) and static
> > > >>>>> resource
> > > >>>> files separate from the source code (Either option 1 or 2).
> > > >>>> Also, option
> > > >>>> 1 will require a little more work by the committers/release
> > > >>>> managers but will definitely avoid any licensing issues/concerns.
> > > >>>>
> > > >>>> I'd definitely vote for (2). That makes releases much easier
> > > >>>> than if you have to coordinate between the ASF and Sourceforge
> > > >>>> repositories, but also allows people to depend on the code in a
> > > >>>> module without also pulling in all the models as well. (This
> > > >>>> would make a lot of sense even now, for example, in
> > > >>>> ctakes-temporal which depends on ctakes-relation-extractor only
> > > >>>> for the relation extraction framework and not for the
> > > >>>> location_of
> > > >>> and
> > > >>>> degree_of models.)
> > > >>>>
> > > >>>> Steve
> > > >>>
> >
> >

Re: [DISCUSS] Where should cTAKES models live?

Posted by Karthik Sarma <ks...@ksarma.com>.
I like b as well





--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab
Member, CA Delegation to the House of Delegates of the American Medical
Association
ksarma@ksarma.com
gchat: ksarma@gmail.com
linkedin: www.linkedin.com/in/ksarma


On Fri, Mar 29, 2013 at 8:58 AM, Masanz, James J. <Ma...@mayo.edu>wrote:

> I agree with about (b)
>
>
> > -----Original Message-----
> > From: dev-return-1411-Masanz.James=mayo.edu@ctakes.apache.org [mailto:
> dev-
> > return-1411-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of Steven
> > Bethard
> > Sent: Friday, March 29, 2013 8:27 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: [DISCUSS] Where should cTAKES models live?
> >
> > On Mar 29, 2013, at 7:09 AM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu
> >
> > wrote:
> > > It looks like the general consensus is for # 2)  Leave them in the ASF
> > repo, but as separate modules/project(s).
> > > Which means we (the community) will take on the risk (security, ip,
> > license, etc.) and responsibility for the models that we commit.
> > > I'll take a stab at this today...
> > > Does anyone think it's worthwhile to (a) lump them all together and
> call
> > it a ctakes-resources project/model for pragmatic reasons?  Otherwise
> (b),
> > we'll have a resource module for each such as ctakes-core-res,
> ctakes-pos-
> > tagger-res, etc.?
> >
> > I prefer (b). I know that means a lot more projects, but if I only want
> > to, say, run the ctakes-temporal models, it would be a pity if I had to
> > pull in the whole UMLS distribution at the same time.
> >
> > Steve
> >
> > >
> > > --Pei
> > >
> > >
> > >> -----Original Message-----
> > >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> > >> Sarma
> > >> Sent: Tuesday, March 19, 2013 1:35 PM
> > >> To: cTAKES Developer List
> > >> Subject: Re: [DISCUSS] Where should cTAKES models live?
> > >>
> > >> I concur. +1 for option 2 -- I do not really see any advantages that
> > >> option
> > >> 3 could have over option 2, as the difference should be largely
> > >> transparent to users (and even developers)
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Karthik Sarma
> > >> UCLA Medical Scientist Training Program Class of 20??
> > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> > >> to the House of Delegates of the American Medical Association
> > >> ksarma@ksarma.com
> > >> gchat: ksarma@gmail.com
> > >> linkedin: www.linkedin.com/in/ksarma
> > >>
> > >>
> > >> On Tue, Mar 19, 2013 at 9:49 AM, Masanz, James J.
> > >> <Ma...@mayo.edu>wrote:
> > >>
> > >>> I also am +1 for option 2.
> > >>>
> > >>> #3 is my least favorite, because of the download time for some of
> > >>> the models, both for cases like Steve mentioned but also for cases
> > >>> of wanting to check out a fresh copy of the code and not wanting to
> > >>> wait to check out the models again
> > >>>
> > >>> -- James
> > >>>
> > >>>
> > >>>> -----Original Message-----
> > >>>> From:
> > >>>> ctakes-dev-return-1378-
> > >> Masanz.James=mayo.edu@incubator.apache.org
> > >>>> [mailto:ctakes-dev-return-1378-Masanz.James=
> > >>> mayo.edu@incubator.apache.org]
> > >>>> On Behalf Of Steven Bethard
> > >>>> Sent: Friday, March 15, 2013 1:06 PM
> > >>>> To: ctakes-dev@incubator.apache.org
> > >>>> Subject: Re: [DISCUSS] Where should cTAKES models live?
> > >>>>
> > >>>> On Mar 15, 2013, at 4:39 PM, "Chen, Pei"
> > >>>> <Pei.Chen@childrens.harvard.edu
> > >>>>
> > >>>> wrote:
> > >>>>> So the question is: What should we do with the model files?  Some
> > >>>> options include:
> > >>>>>
> > >>>>> 1)      Leave them in SourceForge/Maven Central.  Maven can
> download
> > >>> and
> > >>>> include them in the convenience binaries in the ctakes-distribution
> > >>>> project. Something we did quickly for 3.0, but needs to be improved
> > >>>> if we go with this approach.  For example: [2]
> > >>>>>
> > >>>>> 2)      Leave them in the ASF repo, but separate modules/projects.
> > >>>>>
> > >>>>> 3)      Keep them in the same respective ASF modules under
> > >>>> /src/main/resources
> > >>>>>
> > >>>>> I think it's nice to keep these fairly large (~1GB) and static
> > >>>>> resource
> > >>>> files separate from the source code (Either option 1 or 2).  Also,
> > >>>> option
> > >>>> 1 will require a little more work by the committers/release
> > >>>> managers but will definitely avoid any licensing issues/concerns.
> > >>>>
> > >>>> I'd definitely vote for (2). That makes releases much easier than
> > >>>> if you have to coordinate between the ASF and Sourceforge
> > >>>> repositories, but also allows people to depend on the code in a
> > >>>> module without also pulling in all the models as well. (This would
> > >>>> make a lot of sense even now, for example, in ctakes-temporal which
> > >>>> depends on ctakes-relation-extractor only for the relation
> > >>>> extraction framework and not for the location_of
> > >>> and
> > >>>> degree_of models.)
> > >>>>
> > >>>> Steve
> > >>>
>
>

RE: [DISCUSS] Where should cTAKES models live?

Posted by "Masanz, James J." <Ma...@mayo.edu>.
I agree with about (b)


> -----Original Message-----
> From: dev-return-1411-Masanz.James=mayo.edu@ctakes.apache.org [mailto:dev-
> return-1411-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of Steven
> Bethard
> Sent: Friday, March 29, 2013 8:27 AM
> To: dev@ctakes.apache.org
> Subject: Re: [DISCUSS] Where should cTAKES models live?
> 
> On Mar 29, 2013, at 7:09 AM, "Chen, Pei" <Pe...@childrens.harvard.edu>
> wrote:
> > It looks like the general consensus is for # 2)  Leave them in the ASF
> repo, but as separate modules/project(s).
> > Which means we (the community) will take on the risk (security, ip,
> license, etc.) and responsibility for the models that we commit.
> > I'll take a stab at this today...
> > Does anyone think it's worthwhile to (a) lump them all together and call
> it a ctakes-resources project/model for pragmatic reasons?  Otherwise (b),
> we'll have a resource module for each such as ctakes-core-res, ctakes-pos-
> tagger-res, etc.?
> 
> I prefer (b). I know that means a lot more projects, but if I only want
> to, say, run the ctakes-temporal models, it would be a pity if I had to
> pull in the whole UMLS distribution at the same time.
> 
> Steve
> 
> >
> > --Pei
> >
> >
> >> -----Original Message-----
> >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> >> Sarma
> >> Sent: Tuesday, March 19, 2013 1:35 PM
> >> To: cTAKES Developer List
> >> Subject: Re: [DISCUSS] Where should cTAKES models live?
> >>
> >> I concur. +1 for option 2 -- I do not really see any advantages that
> >> option
> >> 3 could have over option 2, as the difference should be largely
> >> transparent to users (and even developers)
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Karthik Sarma
> >> UCLA Medical Scientist Training Program Class of 20??
> >> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> >> to the House of Delegates of the American Medical Association
> >> ksarma@ksarma.com
> >> gchat: ksarma@gmail.com
> >> linkedin: www.linkedin.com/in/ksarma
> >>
> >>
> >> On Tue, Mar 19, 2013 at 9:49 AM, Masanz, James J.
> >> <Ma...@mayo.edu>wrote:
> >>
> >>> I also am +1 for option 2.
> >>>
> >>> #3 is my least favorite, because of the download time for some of
> >>> the models, both for cases like Steve mentioned but also for cases
> >>> of wanting to check out a fresh copy of the code and not wanting to
> >>> wait to check out the models again
> >>>
> >>> -- James
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From:
> >>>> ctakes-dev-return-1378-
> >> Masanz.James=mayo.edu@incubator.apache.org
> >>>> [mailto:ctakes-dev-return-1378-Masanz.James=
> >>> mayo.edu@incubator.apache.org]
> >>>> On Behalf Of Steven Bethard
> >>>> Sent: Friday, March 15, 2013 1:06 PM
> >>>> To: ctakes-dev@incubator.apache.org
> >>>> Subject: Re: [DISCUSS] Where should cTAKES models live?
> >>>>
> >>>> On Mar 15, 2013, at 4:39 PM, "Chen, Pei"
> >>>> <Pei.Chen@childrens.harvard.edu
> >>>>
> >>>> wrote:
> >>>>> So the question is: What should we do with the model files?  Some
> >>>> options include:
> >>>>>
> >>>>> 1)      Leave them in SourceForge/Maven Central.  Maven can download
> >>> and
> >>>> include them in the convenience binaries in the ctakes-distribution
> >>>> project. Something we did quickly for 3.0, but needs to be improved
> >>>> if we go with this approach.  For example: [2]
> >>>>>
> >>>>> 2)      Leave them in the ASF repo, but separate modules/projects.
> >>>>>
> >>>>> 3)      Keep them in the same respective ASF modules under
> >>>> /src/main/resources
> >>>>>
> >>>>> I think it's nice to keep these fairly large (~1GB) and static
> >>>>> resource
> >>>> files separate from the source code (Either option 1 or 2).  Also,
> >>>> option
> >>>> 1 will require a little more work by the committers/release
> >>>> managers but will definitely avoid any licensing issues/concerns.
> >>>>
> >>>> I'd definitely vote for (2). That makes releases much easier than
> >>>> if you have to coordinate between the ASF and Sourceforge
> >>>> repositories, but also allows people to depend on the code in a
> >>>> module without also pulling in all the models as well. (This would
> >>>> make a lot of sense even now, for example, in ctakes-temporal which
> >>>> depends on ctakes-relation-extractor only for the relation
> >>>> extraction framework and not for the location_of
> >>> and
> >>>> degree_of models.)
> >>>>
> >>>> Steve
> >>>


Re: [DISCUSS] Where should cTAKES models live?

Posted by Steven Bethard <st...@Colorado.EDU>.
On Mar 29, 2013, at 7:09 AM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:
> It looks like the general consensus is for # 2)  Leave them in the ASF repo, but as separate modules/project(s).
> Which means we (the community) will take on the risk (security, ip, license, etc.) and responsibility for the models that we commit.
> I'll take a stab at this today...  
> Does anyone think it's worthwhile to (a) lump them all together and call it a ctakes-resources project/model for pragmatic reasons?  Otherwise (b), we'll have a resource module for each such as ctakes-core-res, ctakes-pos-tagger-res, etc.?

I prefer (b). I know that means a lot more projects, but if I only want to, say, run the ctakes-temporal models, it would be a pity if I had to pull in the whole UMLS distribution at the same time.

Steve

> 
> --Pei
> 
> 
>> -----Original Message-----
>> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
>> Sarma
>> Sent: Tuesday, March 19, 2013 1:35 PM
>> To: cTAKES Developer List
>> Subject: Re: [DISCUSS] Where should cTAKES models live?
>> 
>> I concur. +1 for option 2 -- I do not really see any advantages that option
>> 3 could have over option 2, as the difference should be largely transparent to
>> users (and even developers)
>> 
>> 
>> 
>> 
>> 
>> --
>> Karthik Sarma
>> UCLA Medical Scientist Training Program Class of 20??
>> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
>> to the House of Delegates of the American Medical Association
>> ksarma@ksarma.com
>> gchat: ksarma@gmail.com
>> linkedin: www.linkedin.com/in/ksarma
>> 
>> 
>> On Tue, Mar 19, 2013 at 9:49 AM, Masanz, James J.
>> <Ma...@mayo.edu>wrote:
>> 
>>> I also am +1 for option 2.
>>> 
>>> #3 is my least favorite, because of the download time for some of the
>>> models, both for cases like Steve mentioned but also for cases of
>>> wanting to check out a fresh copy of the code and not wanting to wait
>>> to check out the models again
>>> 
>>> -- James
>>> 
>>> 
>>>> -----Original Message-----
>>>> From:
>>>> ctakes-dev-return-1378-
>> Masanz.James=mayo.edu@incubator.apache.org
>>>> [mailto:ctakes-dev-return-1378-Masanz.James=
>>> mayo.edu@incubator.apache.org]
>>>> On Behalf Of Steven Bethard
>>>> Sent: Friday, March 15, 2013 1:06 PM
>>>> To: ctakes-dev@incubator.apache.org
>>>> Subject: Re: [DISCUSS] Where should cTAKES models live?
>>>> 
>>>> On Mar 15, 2013, at 4:39 PM, "Chen, Pei"
>>>> <Pei.Chen@childrens.harvard.edu
>>>> 
>>>> wrote:
>>>>> So the question is: What should we do with the model files?  Some
>>>> options include:
>>>>> 
>>>>> 1)      Leave them in SourceForge/Maven Central.  Maven can download
>>> and
>>>> include them in the convenience binaries in the ctakes-distribution
>>>> project. Something we did quickly for 3.0, but needs to be improved
>>>> if we go with this approach.  For example: [2]
>>>>> 
>>>>> 2)      Leave them in the ASF repo, but separate modules/projects.
>>>>> 
>>>>> 3)      Keep them in the same respective ASF modules under
>>>> /src/main/resources
>>>>> 
>>>>> I think it's nice to keep these fairly large (~1GB) and static
>>>>> resource
>>>> files separate from the source code (Either option 1 or 2).  Also,
>>>> option
>>>> 1 will require a little more work by the committers/release managers
>>>> but will definitely avoid any licensing issues/concerns.
>>>> 
>>>> I'd definitely vote for (2). That makes releases much easier than if
>>>> you have to coordinate between the ASF and Sourceforge repositories,
>>>> but also allows people to depend on the code in a module without
>>>> also pulling in all the models as well. (This would make a lot of
>>>> sense even now, for example, in ctakes-temporal which depends on
>>>> ctakes-relation-extractor only for the relation extraction framework
>>>> and not for the location_of
>>> and
>>>> degree_of models.)
>>>> 
>>>> Steve
>>> 


RE: [DISCUSS] Where should cTAKES models live?

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
It looks like the general consensus is for # 2)  Leave them in the ASF repo, but as separate modules/project(s).
Which means we (the community) will take on the risk (security, ip, license, etc.) and responsibility for the models that we commit.
I'll take a stab at this today...  
Does anyone think it's worthwhile to (a) lump them all together and call it a ctakes-resources project/model for pragmatic reasons?  Otherwise (b), we'll have a resource module for each such as ctakes-core-res, ctakes-pos-tagger-res, etc.?

--Pei


> -----Original Message-----
> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of Karthik
> Sarma
> Sent: Tuesday, March 19, 2013 1:35 PM
> To: cTAKES Developer List
> Subject: Re: [DISCUSS] Where should cTAKES models live?
> 
> I concur. +1 for option 2 -- I do not really see any advantages that option
> 3 could have over option 2, as the difference should be largely transparent to
> users (and even developers)
> 
> 
> 
> 
> 
> --
> Karthik Sarma
> UCLA Medical Scientist Training Program Class of 20??
> Member, UCLA Medical Imaging & Informatics Lab Member, CA Delegation
> to the House of Delegates of the American Medical Association
> ksarma@ksarma.com
> gchat: ksarma@gmail.com
> linkedin: www.linkedin.com/in/ksarma
> 
> 
> On Tue, Mar 19, 2013 at 9:49 AM, Masanz, James J.
> <Ma...@mayo.edu>wrote:
> 
> > I also am +1 for option 2.
> >
> > #3 is my least favorite, because of the download time for some of the
> > models, both for cases like Steve mentioned but also for cases of
> > wanting to check out a fresh copy of the code and not wanting to wait
> > to check out the models again
> >
> > -- James
> >
> >
> > > -----Original Message-----
> > > From:
> > > ctakes-dev-return-1378-
> Masanz.James=mayo.edu@incubator.apache.org
> > > [mailto:ctakes-dev-return-1378-Masanz.James=
> > mayo.edu@incubator.apache.org]
> > > On Behalf Of Steven Bethard
> > > Sent: Friday, March 15, 2013 1:06 PM
> > > To: ctakes-dev@incubator.apache.org
> > > Subject: Re: [DISCUSS] Where should cTAKES models live?
> > >
> > > On Mar 15, 2013, at 4:39 PM, "Chen, Pei"
> > > <Pei.Chen@childrens.harvard.edu
> > >
> > > wrote:
> > > > So the question is: What should we do with the model files?  Some
> > > options include:
> > > >
> > > > 1)      Leave them in SourceForge/Maven Central.  Maven can download
> > and
> > > include them in the convenience binaries in the ctakes-distribution
> > > project. Something we did quickly for 3.0, but needs to be improved
> > > if we go with this approach.  For example: [2]
> > > >
> > > > 2)      Leave them in the ASF repo, but separate modules/projects.
> > > >
> > > > 3)      Keep them in the same respective ASF modules under
> > > /src/main/resources
> > > >
> > > > I think it's nice to keep these fairly large (~1GB) and static
> > > > resource
> > > files separate from the source code (Either option 1 or 2).  Also,
> > > option
> > > 1 will require a little more work by the committers/release managers
> > > but will definitely avoid any licensing issues/concerns.
> > >
> > > I'd definitely vote for (2). That makes releases much easier than if
> > > you have to coordinate between the ASF and Sourceforge repositories,
> > > but also allows people to depend on the code in a module without
> > > also pulling in all the models as well. (This would make a lot of
> > > sense even now, for example, in ctakes-temporal which depends on
> > > ctakes-relation-extractor only for the relation extraction framework
> > > and not for the location_of
> > and
> > > degree_of models.)
> > >
> > > Steve
> >

Re: [DISCUSS] Where should cTAKES models live?

Posted by Karthik Sarma <ks...@ksarma.com>.
I concur. +1 for option 2 -- I do not really see any advantages that option
3 could have over option 2, as the difference should be largely transparent
to users (and even developers)





--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab
Member, CA Delegation to the House of Delegates of the American Medical
Association
ksarma@ksarma.com
gchat: ksarma@gmail.com
linkedin: www.linkedin.com/in/ksarma


On Tue, Mar 19, 2013 at 9:49 AM, Masanz, James J. <Ma...@mayo.edu>wrote:

> I also am +1 for option 2.
>
> #3 is my least favorite, because of the download time for some of the
> models, both for cases like Steve mentioned but also for cases of wanting
> to check out a fresh copy of the code and not wanting to wait to check out
> the models again
>
> -- James
>
>
> > -----Original Message-----
> > From: ctakes-dev-return-1378-Masanz.James=mayo.edu@incubator.apache.org
> > [mailto:ctakes-dev-return-1378-Masanz.James=
> mayo.edu@incubator.apache.org]
> > On Behalf Of Steven Bethard
> > Sent: Friday, March 15, 2013 1:06 PM
> > To: ctakes-dev@incubator.apache.org
> > Subject: Re: [DISCUSS] Where should cTAKES models live?
> >
> > On Mar 15, 2013, at 4:39 PM, "Chen, Pei" <Pei.Chen@childrens.harvard.edu
> >
> > wrote:
> > > So the question is: What should we do with the model files?  Some
> > options include:
> > >
> > > 1)      Leave them in SourceForge/Maven Central.  Maven can download
> and
> > include them in the convenience binaries in the ctakes-distribution
> > project. Something we did quickly for 3.0, but needs to be improved if we
> > go with this approach.  For example: [2]
> > >
> > > 2)      Leave them in the ASF repo, but separate modules/projects.
> > >
> > > 3)      Keep them in the same respective ASF modules under
> > /src/main/resources
> > >
> > > I think it's nice to keep these fairly large (~1GB) and static resource
> > files separate from the source code (Either option 1 or 2).  Also, option
> > 1 will require a little more work by the committers/release managers but
> > will definitely avoid any licensing issues/concerns.
> >
> > I'd definitely vote for (2). That makes releases much easier than if you
> > have to coordinate between the ASF and Sourceforge repositories, but also
> > allows people to depend on the code in a module without also pulling in
> > all the models as well. (This would make a lot of sense even now, for
> > example, in ctakes-temporal which depends on ctakes-relation-extractor
> > only for the relation extraction framework and not for the location_of
> and
> > degree_of models.)
> >
> > Steve
>

RE: [DISCUSS] Where should cTAKES models live?

Posted by "Masanz, James J." <Ma...@mayo.edu>.
I also am +1 for option 2.

#3 is my least favorite, because of the download time for some of the models, both for cases like Steve mentioned but also for cases of wanting to check out a fresh copy of the code and not wanting to wait to check out the models again

-- James


> -----Original Message-----
> From: ctakes-dev-return-1378-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1378-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Steven Bethard
> Sent: Friday, March 15, 2013 1:06 PM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] Where should cTAKES models live?
> 
> On Mar 15, 2013, at 4:39 PM, "Chen, Pei" <Pe...@childrens.harvard.edu>
> wrote:
> > So the question is: What should we do with the model files?  Some
> options include:
> >
> > 1)      Leave them in SourceForge/Maven Central.  Maven can download and
> include them in the convenience binaries in the ctakes-distribution
> project. Something we did quickly for 3.0, but needs to be improved if we
> go with this approach.  For example: [2]
> >
> > 2)      Leave them in the ASF repo, but separate modules/projects.
> >
> > 3)      Keep them in the same respective ASF modules under
> /src/main/resources
> >
> > I think it's nice to keep these fairly large (~1GB) and static resource
> files separate from the source code (Either option 1 or 2).  Also, option
> 1 will require a little more work by the committers/release managers but
> will definitely avoid any licensing issues/concerns.
> 
> I'd definitely vote for (2). That makes releases much easier than if you
> have to coordinate between the ASF and Sourceforge repositories, but also
> allows people to depend on the code in a module without also pulling in
> all the models as well. (This would make a lot of sense even now, for
> example, in ctakes-temporal which depends on ctakes-relation-extractor
> only for the relation extraction framework and not for the location_of and
> degree_of models.)
> 
> Steve

Re: [DISCUSS] Where should cTAKES models live?

Posted by Steven Bethard <st...@Colorado.EDU>.
On Mar 15, 2013, at 4:39 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:
> So the question is: What should we do with the model files?  Some options include:
> 
> 1)      Leave them in SourceForge/Maven Central.  Maven can download and include them in the convenience binaries in the ctakes-distribution project. Something we did quickly for 3.0, but needs to be improved if we go with this approach.  For example: [2]
> 
> 2)      Leave them in the ASF repo, but separate modules/projects.
> 
> 3)      Keep them in the same respective ASF modules under /src/main/resources
> 
> I think it's nice to keep these fairly large (~1GB) and static resource files separate from the source code (Either option 1 or 2).  Also, option 1 will require a little more work by the committers/release managers but will definitely avoid any licensing issues/concerns.

I'd definitely vote for (2). That makes releases much easier than if you have to coordinate between the ASF and Sourceforge repositories, but also allows people to depend on the code in a module without also pulling in all the models as well. (This would make a lot of sense even now, for example, in ctakes-temporal which depends on ctakes-relation-extractor only for the relation extraction framework and not for the location_of and degree_of models.)

Steve