You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Masanz, James J." <Ma...@mayo.edu> on 2013/01/25 08:55:55 UTC

[DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

I posted on general@incubator that:

> One goal is to have a binary that contains all resources, 
> which can be used to install cTAKES on a system that does
> not have an internet connection.
> For now we can focus on a first Apache release that 
> doesn't meet that goal, while pursuing the question with legal.
> If legal says we can't do have that kind of binary here, 
> then in the future we can consider
> if we will host such a binary on a different site.

http://s.apache.org/bgp

Another motivation for this email is a post by Benson (below) to general@incubator, where he writes "It's not the mission of the ASF to create complete, end-user-friendly, software products".

I suggest we, or whoever among us are interested in such a thing, host an easy-to-install *binary* that includes cTAKES plus the models and jars, somewhere other than apache.org, that would be a single download with a simple unzip (and would be built off Apache cTAKES 3.0.0-incubating, once it is released).

This binary would probably be released shortly after each Apache cTAKES release, so it could be built from the officially released Apache cTAKES source.

From my understanding, we cannot have models in SVN here if they were built from data that is not available to the community since the models are not "source". That's based on this specific comment within LEGAL-157: 
https://issues.apache.org/jira/browse/LEGAL-157?focusedCommentId=13561092&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13561092

We also cannot have other compiled jars in our SVN here at apache.org, and therefore cannot be in our source release, which we are working on addressing

For people checking out code from SVN and using maven, those are not such big issues since maven will fetch the dependencies once we finish updating the POMs etc.

If we want to allow people to download a single binary and get the cTAKES code and the models, it sounds like we either need to 
1) write something that would download the models for the users 
2) or host the binaries elsewhere 
(or require users to download things separately and put them together).

I strongly dislike option 1, so I will focus on option 2 in this email, as that will be more than enough for one email any way ;)

For people to host such an all-inclusive binary elsewhere, those people would need to choose a name.
We could create a logo for their use, something like "Apache cTAKES inside" or  "Powered by Apache cTAKES" (see http://www.apache.org/foundation/marks/pmcs.html#poweredby) and make it clear the binary is not being released directly by Apache http://s.apache.org/BAj

I suggest that we wouldn't need to create a convenience binary here at Apache - one less thing to test and document.

This would bring up several questions though, which I'm guessing we don't want to get into here in great detail since it is really about something that is not to be released directly from Apache.
 - what to call the binary (we would not simply be able to call it "Apache cTAKES")
 - where to host the binary (I'd suggest the ohnlp sourceforge project, where previous versions of cTAKES live)
 - we would need a place to hold the documentation for this binary. I am assuming we could not host it as apache.org, but we would need that either confirmed here or create a legal Jira to get that confirmation.
 - where would we tell people to go to post questions about the binary? 
 - where would the build of the binary take place 

I suggest taking those questions offline unless someone tells me those things are indeed OK to discuss here.

My main point to discuss here is whether there is enough value in providing a convenience binary of Apache cTAKES here at apache.org (which would not contain the models) for us to create and support it here, or if we skip creating binary here at apache.org and only create source packages here.

I am not trying to splinter the group here. I would hope anyone involved in producing the binary would be involved here with Apache cTAKES too. But there might be people involved in Apache cTAKES that aren't interested in the details of how a binary is produced or what it looks like, or even if it is produced.

-- James

> -----Original Message-----
> From: general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:general-return-39392-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Benson Margulies
> Sent: Thursday, January 24, 2013 9:23 PM
> To: general@incubator.apache.org
> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> It's unfortunate to have this conversation in parallel here and on
> https://issues.apache.org/jira/browse/LEGAL-157.
> 
> Also, this thread is a combo of the discussion of ordinary jars-of-classes
> (where I'd forgotten the policy) and the much more tangled question of
> models, which is what the JIRA is wrestling with.
> 
> To answer Ted, I think that Roy might write something like:
> 
> "It's not the mission of the ASF to create complete, end-user-friendly,
> software products. It's our mission to create open source code. If someone
> else wants to build up an end-user-friendly aggregation of ASF code and
> models from bombs of whatever, that's great, and we encourage them."
> 
> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <br...@apache.org> wrote:
> > On 25.01.2013 01:50, Ted Dunning wrote:
> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <br...@apache.org> wrote:
> >>
> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> >>> ...>>
> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> >>>> Well, that clear enough, even if it is a typical example of how our
> >>>> founders yell at us but we have no mechanism to channel those yells
> >>>> into concise, unambiguous, documentation.
> >>> Per haps off-topic ... but I fail to see how "source release" is
> >>> ambiguous or not concise.
> >>>
> >>> Unless the Java world has a different definition of "source code"
> >>> than us stuck-in-the-mud plodders, and it's only considered binary
> >>> once it's been JIT-compiled. :)
> >>>
> >>
> >> It isn't necessarily ambiguous when applied to code, but there is a
> >> different case when applied to models  or parameter settings.
> >>
> >> For instance, commons match has polynomial coefficients embedded in
> >> code that approximate certain functions.  These are the results of
> >> computations done using other systems and the source code and the
> >> data used in those other computations are not included in the
> >> released code, only the parameter values are.
> >>
> >> This same sort of thing applies here except that the model in
> >> question has a much larger set of values and is being packaged in a
> >> binary, inspectable format.  Would your opinion change if the model
> >> were expressed in a textual model?  Would it matter that the textual
> >> model is too large and obtuse to usefully inspect?
> >
> > In cases like this one, it would seem reasonable for the source code
> > to refer to those models and computations, which presumably anyone can
> > then reproduce to their own satisfaction. This is unlike compiled code
> > in that compilation results are notoriously hard to reproduce exactly,
> > because they depend on many factors that are usually hard to document,
> > let alone reproduce. I'd expect a mathematical model, no matter how
> > large, does not suffer from such ambiguities (and shut up, Gödel).
> >
> > However, that's beside the point, because ...
> >
> >> What about a hypothetical case where the model is derived from the
> >> explosion of a nuclear bomb?  Would the release of the numbers
> >> require the inclusion of a suitable bomb design so that everybody
> >> could replicate the derivation?
> >
> > ... the issue is not about the exposing all the knowledge that goes
> > into writing the code, but to expose the code itself so that it can be
> > reviewed for, e.g., back-doors and other security issues. Neither of
> > your examples is relevant.
> >
> > -- Brane
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org


RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Masanz, James J." <Ma...@mayo.edu>.
> -----Original Message-----
> From: ctakes-dev-return-1108-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1108-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Jukka Zitting
> Sent: Friday, January 25, 2013 3:00 AM
> To: ctakes-dev
> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Hi,
> 
> On Fri, Jan 25, 2013 at 10:10 AM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
> > Just to clarify -- that's Benson, talking for Roy. :) I realize that
> > this has got all skitzo lately, but just pointing out that this is far
> > from doctrine. Apache OpenOffice is a prime counter example to his
> > point and I just made that point myself.
> 
> +1 Apache hosts various different kinds of binaries as a convenience
> for users. The main requirements for such binaries are that they're
> properly documented (licensing metadata, etc.) and signed, only contain
> components that meet the Apache licensing policies and come from a trusted
> member of the community (as there's no easy way to verify the contents of
> a binary).
> 
> So far the only reason I've seen that could preclude cTAKES from making
> such binaries available on www.apache.org/dist is the licensing status of
> some the contained models. Having resolved LEGAL issues on those would
> take care of that concern.

For documentation sake, I'll mention here that 
https://issues.apache.org/jira/browse/LEGAL-157 is open.

-- James Masanz

> 
> BR,
> 
> Jukka Zitting

Re: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Jan 25, 2013 at 10:10 AM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> Just to clarify -- that's Benson, talking for Roy. :) I realize that this
> has got all skitzo lately, but just pointing out that this is far from
> doctrine. Apache OpenOffice is a prime counter example to his point and I
> just made that point myself.

+1 Apache hosts various different kinds of binaries as a convenience
for users. The main requirements for such binaries are that they're
properly documented (licensing metadata, etc.) and signed, only
contain components that meet the Apache licensing policies and come
from a trusted member of the community (as there's no easy way to
verify the contents of a binary).

So far the only reason I've seen that could preclude cTAKES from
making such binaries available on www.apache.org/dist is the licensing
status of some the contained models. Having resolved LEGAL issues on
those would take care of that concern.

BR,

Jukka Zitting

RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Masanz, James J." <Ma...@mayo.edu>.
> -----Original Message-----
> From: ctakes-dev-return-1113-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1113-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Chen, Pei
> Sent: Friday, January 25, 2013 9:42 AM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Based on the ongoing discussions,
> Could I suggest we cancel the VOTE on RC5 and create an RC6?

+1  to that

> RC6 will be an extremely conservative-

Again +1 to that

> - No resources (models) included in src/main/java
> - No resources (models) included in the -bin.tar.gz
> - Move all of the models and resources to a ctakes-models projects within
> the ctakes-resources on sourceforge (currently used by the UMLS resources
> already).
> - Update the pom.xml's to download those for developers via maven.
> - End-Users will have to download and unzip a ctakes-resources.zip which
> contains all of the models and resources (including UMLS).

All sound good as a compromise for this release to me.

> I believe this is just a temporary measure (at least a decent compromise)
> until we get clarity on some of these items.
> We can create subsequent releases afterwards such as a single -bin.tar.gz
> that includes the models just like any other 3rd party lib, 

The models are dependent on the outcome of
https://issues.apache.org/jira/browse/LEGAL-157


Regards, 
James Masanz

> and then possibly including it in src as well.
> 
> I do not think this is a "end user friendly" issue, IMHO, it just doesn't
> makes sense to separate out parts of software that are an intricate part
> of the software and are always required to function properly such as
> icons, gifs, jpgs, or statistical models in this case (which have been
> approved to be released under ASL 2.0 terms by their contributors).
> 
> --Pei
> 
> 
> > -----Original Message-----
> > From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> > Sent: Friday, January 25, 2013 10:11 AM
> > To: 'ctakes-dev@incubator.apache.org'
> > Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> >
> > > -----Original Message-----
> > > From:
> > > ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org
> > > [mailto:ctakes-dev-return-1106-
> > Masanz.James=mayo.edu@incubator.apache.
> > > org]
> > > On Behalf Of Mattmann, Chris A (388J)
> > > Sent: Friday, January 25, 2013 2:10 AM
> > > To: ctakes-dev@incubator.apache.org
> > > Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> > >
> > > Hey James,
> > >
> > > On 1/24/13 11:55 PM, "Masanz, James J." <Ma...@mayo.edu>
> > wrote:
> > >
> > > >I posted on general@incubator that:
> > > >
> > > >> One goal is to have a binary that contains all resources, which
> > > >> can be used to install cTAKES on a system that does not have an
> > > >> internet connection.
> > > >> For now we can focus on a first Apache release that doesn't meet
> > > >> that goal, while pursuing the question with legal.
> > > >> If legal says we can't do have that kind of binary here, then in
> > > >> the future we can consider if we will host such a binary on a
> > > >> different site.
> > > >
> > > >http://s.apache.org/bgp
> > > >
> > > >Another motivation for this email is a post by Benson (below) to
> > > >general@incubator, where he writes "It's not the mission of the ASF
> > > >to create complete, end-user-friendly, software products".
> > >
> > > Just to clarify -- that's Benson, talking for Roy. :) I realize that
> > > this has got all skitzo lately, but just pointing out that this is
> > > far from doctrine. Apache OpenOffice is a prime counter example to
> > > his point and I just made that point myself.
> > >
> > > >
> > > >I suggest we, or whoever among us are interested in such a thing,
> > > >host an easy-to-install *binary* that includes cTAKES plus the
> > > >models and jars, somewhere other than apache.org, that would be a
> > > >single download with a simple unzip (and would be built off Apache
> > > >cTAKES 3.0.0-incubating, once it is released).
> > >
> > > If it comes to this, I'd recommend hosting it at
> > > http://apache-extras.org/ which is Google Code, but branded with
> > > Apache through a special ComDev agreement set up. Products developed
> > there are said to have an "affinity"
> > > towards particular Apache products, but not be those Apache products.
> > > Apache Extras != Apache, but still is an option for those parts.
> > >
> > > >
> > > >This binary would probably be released shortly after each Apache
> > > >cTAKES release, so it could be built from the officially released
> > > >Apache cTAKES source.
> > >
> > > Yep. I don't think the battle is over there yet though -- I liked
> > > your suggestion however -- let's just roll a source release, and try
> > > to push the convenience binaries as needed.
> > >
> > > >
> > > >From my understanding, we cannot have models in SVN here if they
> > > >were built from data that is not available to the community since
> > > >the models are not "source". That's based on this specific comment
> > > >within
> > LEGAL-157:
> > > >https://issues.apache.org/jira/browse/LEGAL-
> > 157?focusedCommentId=1356
> > > >10
> > > >92&
> > > >page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpan
> > > >el
> > > >#c
> > > >omm
> > > >ent-13561092
> > >
> > > That's Benson's opinion, note Roy hasn't replied to him. I don't
> > > read Roy's reading on the subject to be that we can't include those
> > > intermediate outputs? Do you?
> >
> > Yes, that's the way I reading Roy's post - that it can't include
> > models (intermediate outputs) because the source for those
> > intermediate outputs is not being included.
> >
> > > >We also cannot have other compiled jars in our SVN here at
> > > >apache.org, and therefore cannot be in our source release, which we
> > > >are working on addressing
> > >
> > > That's not recommended, but also not an absolute blocker and can be
> > > improved incrementally. Prior versions of Apache Lucene (and
> > > anything built from Ant) had this issue and those releases shipped
> just fine.
> >
> > That's great to know. Thanks.
> >
> > > >
> > > >For people checking out code from SVN and using maven, those are
> > > >not such big issues since maven will fetch the dependencies once we
> > > >finish updating the POMs etc.
> > > >
> > > >If we want to allow people to download a single binary and get the
> > > >cTAKES code and the models, it sounds like we either need to
> > > >1) write something that would download the models for the users
> > > >2) or host the binaries elsewhere
> > > >(or require users to download things separately and put them
> together).
> > >
> > > I would highly suggest #1 to avoid fragmentation.
> > >
> > > >
> > > >I strongly dislike option 1, so I will focus on option 2 in this
> > > >email, as that will be more than enough for one email any way ;)
> > >
> > > Why don't you like option #1? Just curious.
> >
> > Two reasons - a goal is to have an install that is as simple as
> > possible to reduce barriers for (very busy) people to give cTAKES a
> > try. (There will be times when downloading models of 100s of MB will
> > fail for one reason or another on the first attempt.)
> >
> > And secondly, the personal experience I've had with writing
> > (commercial) install code, which very often turned into a vastly more
> > difficult and time consuming (testing-wise) task than people would
> > allow for, and also resulted in more enduser questions than
> > anticipated. Which leads to an admittedly personal bias against such
> > things, if they can be avoided. But I mentioned #1 because I know my
> views on #2 are partially a personal bias.
> >
> > > >For people to host such an all-inclusive binary elsewhere, those
> > > >people would need to choose a name.
> > > >We could create a logo for their use, something like "Apache cTAKES
> > > >inside" or  "Powered by Apache cTAKES" (see
> > > >http://www.apache.org/foundation/marks/pmcs.html#poweredby) and
> > make
> > > >it clear the binary is not being released directly by Apache
> > > >http://s.apache.org/BAj
> > > >
> > > >I suggest that we wouldn't need to create a convenience binary here
> > > >at Apache - one less thing to test and document.
> > > >
> > > >This would bring up several questions though, which I'm guessing we
> > > >don't want to get into here in great detail since it is really
> > > >about something that is not to be released directly from Apache.
> > > > - what to call the binary (we would not simply be able to call it
> > > >"Apache cTAKES")
> > > > - where to host the binary (I'd suggest the ohnlp sourceforge
> > > >project, where previous versions of cTAKES live)
> > > > - we would need a place to hold the documentation for this binary.
> > > >I am assuming we could not host it as apache.org, but we would need
> > > >that either confirmed here or create a legal Jira to get that
> confirmation.
> > > > - where would we tell people to go to post questions about the
> binary?
> > > > - where would the build of the binary take place
> > > >
> > > >I suggest taking those questions offline unless someone tells me
> > > >those things are indeed OK to discuss here.
> > > >
> > > >My main point to discuss here is whether there is enough value in
> > > >providing a convenience binary of Apache cTAKES here at apache.org
> > > >(which would not contain the models) for us to create and support
> > > >it here, or if we skip creating binary here at apache.org and only
> > > >create source packages here.
> > > >
> > > >I am not trying to splinter the group here. I would hope anyone
> > > >involved in producing the binary would be involved here with Apache
> > > cTAKES too.
> > > >But there might be people involved in Apache cTAKES that aren't
> > > >interested in the details of how a binary is produced or what it
> > > >looks like, or even if it is produced.
> > >
> > > That's a possibility but brings with a whole horde of other legal
> > > mumbo jumbo (and trademarks@) that trust me you don't want to go
> > down (yet).
> > > Maybe ever :)
> > >
> > > Try and focus on #1 -- I bet it's achievable without all the
> > > convenience binaries part. Would that work for the community?
> >
> > We have previously (before Apache) received lots of positive end user
> > feedback about what an improvement providing an all-inclusive binary
> > was for them.
> > Not providing it is a step backward for us.
> >
> > > Cheers,
> > > Chris
> > > >
> > > >-- James
> > > >
> >
> > -- James
> >
> > > >> -----Original Message-----
> > > >> From:
> > > >> general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> > > >> [mailto:general-return-39392-
> > Masanz.James=mayo.edu@incubator.apache
> > > >> .o
> > > >> rg]
> > > >> On Behalf Of Benson Margulies
> > > >> Sent: Thursday, January 24, 2013 9:23 PM
> > > >> To: general@incubator.apache.org
> > > >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> > > >>
> > > >> It's unfortunate to have this conversation in parallel here and
> > > >> on https://issues.apache.org/jira/browse/LEGAL-157.
> > > >>
> > > >> Also, this thread is a combo of the discussion of ordinary
> > > >>jars-of-classes  (where I'd forgotten the policy) and the much
> > > >>more tangled question of  models, which is what the JIRA is
> wrestling with.
> > > >>
> > > >> To answer Ted, I think that Roy might write something like:
> > > >>
> > > >> "It's not the mission of the ASF to create complete,
> > > >>end-user-friendly,  software products. It's our mission to create
> > > >>open source code. If someone  else wants to build up an
> > > >>end-user-friendly aggregation of ASF code and  models from bombs
> > > >>of whatever, that's great, and we encourage them."
> > > >>
> > > >> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <br...@apache.org>
> > wrote:
> > > >> > On 25.01.2013 01:50, Ted Dunning wrote:
> > > >> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej
> > > >> >> <br...@apache.org>
> > > >>wrote:
> > > >> >>
> > > >> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> > > >> >>> ...>>
> > > >> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> > > >> >>>> Well, that clear enough, even if it is a typical example of
> > > >> >>>> how our founders yell at us but we have no mechanism to
> > > >> >>>> channel those yells into concise, unambiguous, documentation.
> > > >> >>> Per haps off-topic ... but I fail to see how "source release"
> > > >> >>> is ambiguous or not concise.
> > > >> >>>
> > > >> >>> Unless the Java world has a different definition of "source
> code"
> > > >> >>> than us stuck-in-the-mud plodders, and it's only considered
> > > >> >>> binary once it's been JIT-compiled. :)
> > > >> >>>
> > > >> >>
> > > >> >> It isn't necessarily ambiguous when applied to code, but there
> > > >> >> is a different case when applied to models  or parameter
> settings.
> > > >> >>
> > > >> >> For instance, commons match has polynomial coefficients
> > > >> >> embedded in code that approximate certain functions.  These
> > > >> >> are the results of computations done using other systems and
> > > >> >> the source code and the data used in those other computations
> > > >> >> are not included in the released code, only the parameter values
> are.
> > > >> >>
> > > >> >> This same sort of thing applies here except that the model in
> > > >> >> question has a much larger set of values and is being packaged
> > > >> >> in a binary, inspectable format.  Would your opinion change if
> > > >> >> the model were expressed in a textual model?  Would it matter
> > > >> >> that the textual model is too large and obtuse to usefully
> inspect?
> > > >> >
> > > >> > In cases like this one, it would seem reasonable for the source
> > > >> > code to refer to those models and computations, which
> > > >> > presumably anyone can then reproduce to their own satisfaction.
> > > >> > This is unlike compiled code in that compilation results are
> > > >> > notoriously hard to reproduce exactly, because they depend on
> > > >> > many factors that are usually hard to document, let alone
> > > >> > reproduce. I'd expect a mathematical model, no matter how
> > > >> > large, does not suffer from such
> > > ambiguities (and shut up, Gödel).
> > > >> >
> > > >> > However, that's beside the point, because ...
> > > >> >
> > > >> >> What about a hypothetical case where the model is derived from
> > > >> >> the explosion of a nuclear bomb?  Would the release of the
> > > >> >> numbers require the inclusion of a suitable bomb design so
> > > >> >> that everybody could replicate the derivation?
> > > >> >
> > > >> > ... the issue is not about the exposing all the knowledge that
> > > >> > goes into writing the code, but to expose the code itself so
> > > >> > that it can be reviewed for, e.g., back-doors and other security
> issues.
> > > >> > Neither of your examples is relevant.
> > > >> >
> > > >> > -- Brane
> > > >> >
> >


RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Based on the ongoing discussions,
Could I suggest we cancel the VOTE on RC5 and create an RC6?
RC6 will be an extremely conservative- 
- No resources (models) included in src/main/java
- No resources (models) included in the -bin.tar.gz
- Move all of the models and resources to a ctakes-models projects within the ctakes-resources on sourceforge (currently used by the UMLS resources already).
- Update the pom.xml's to download those for developers via maven.
- End-Users will have to download and unzip a ctakes-resources.zip which contains all of the models and resources (including UMLS).

I believe this is just a temporary measure (at least a decent compromise) until we get clarity on some of these items.
We can create subsequent releases afterwards such as a single -bin.tar.gz that includes the models just like any other 3rd party lib, and then possibly including it in src as well.

I do not think this is a "end user friendly" issue, IMHO, it just doesn't makes sense to separate out parts of software that are an intricate part of the software and are always required to function properly such as icons, gifs, jpgs, or statistical models in this case (which have been approved to be released under ASL 2.0 terms by their contributors).  

--Pei


> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Friday, January 25, 2013 10:11 AM
> To: 'ctakes-dev@incubator.apache.org'
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> 
> > -----Original Message-----
> > From:
> > ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org
> > [mailto:ctakes-dev-return-1106-
> Masanz.James=mayo.edu@incubator.apache.
> > org]
> > On Behalf Of Mattmann, Chris A (388J)
> > Sent: Friday, January 25, 2013 2:10 AM
> > To: ctakes-dev@incubator.apache.org
> > Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> > Hey James,
> >
> > On 1/24/13 11:55 PM, "Masanz, James J." <Ma...@mayo.edu>
> wrote:
> >
> > >I posted on general@incubator that:
> > >
> > >> One goal is to have a binary that contains all resources, which can
> > >> be used to install cTAKES on a system that does not have an
> > >> internet connection.
> > >> For now we can focus on a first Apache release that doesn't meet
> > >> that goal, while pursuing the question with legal.
> > >> If legal says we can't do have that kind of binary here, then in
> > >> the future we can consider if we will host such a binary on a
> > >> different site.
> > >
> > >http://s.apache.org/bgp
> > >
> > >Another motivation for this email is a post by Benson (below) to
> > >general@incubator, where he writes "It's not the mission of the ASF
> > >to create complete, end-user-friendly, software products".
> >
> > Just to clarify -- that's Benson, talking for Roy. :) I realize that
> > this has got all skitzo lately, but just pointing out that this is far
> > from doctrine. Apache OpenOffice is a prime counter example to his
> > point and I just made that point myself.
> >
> > >
> > >I suggest we, or whoever among us are interested in such a thing,
> > >host an easy-to-install *binary* that includes cTAKES plus the models
> > >and jars, somewhere other than apache.org, that would be a single
> > >download with a simple unzip (and would be built off Apache cTAKES
> > >3.0.0-incubating, once it is released).
> >
> > If it comes to this, I'd recommend hosting it at
> > http://apache-extras.org/ which is Google Code, but branded with
> > Apache through a special ComDev agreement set up. Products developed
> there are said to have an "affinity"
> > towards particular Apache products, but not be those Apache products.
> > Apache Extras != Apache, but still is an option for those parts.
> >
> > >
> > >This binary would probably be released shortly after each Apache
> > >cTAKES release, so it could be built from the officially released
> > >Apache cTAKES source.
> >
> > Yep. I don't think the battle is over there yet though -- I liked your
> > suggestion however -- let's just roll a source release, and try to
> > push the convenience binaries as needed.
> >
> > >
> > >From my understanding, we cannot have models in SVN here if they were
> > >built from data that is not available to the community since the
> > >models are not "source". That's based on this specific comment within
> LEGAL-157:
> > >https://issues.apache.org/jira/browse/LEGAL-
> 157?focusedCommentId=1356
> > >10
> > >92&
> > >page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
> > >#c
> > >omm
> > >ent-13561092
> >
> > That's Benson's opinion, note Roy hasn't replied to him. I don't read
> > Roy's reading on the subject to be that we can't include those
> > intermediate outputs? Do you?
> 
> Yes, that's the way I reading Roy's post - that it can't include models
> (intermediate outputs) because the source for those intermediate outputs is
> not being included.
> 
> > >We also cannot have other compiled jars in our SVN here at
> > >apache.org, and therefore cannot be in our source release, which we
> > >are working on addressing
> >
> > That's not recommended, but also not an absolute blocker and can be
> > improved incrementally. Prior versions of Apache Lucene (and anything
> > built from Ant) had this issue and those releases shipped just fine.
> 
> That's great to know. Thanks.
> 
> > >
> > >For people checking out code from SVN and using maven, those are not
> > >such big issues since maven will fetch the dependencies once we
> > >finish updating the POMs etc.
> > >
> > >If we want to allow people to download a single binary and get the
> > >cTAKES code and the models, it sounds like we either need to
> > >1) write something that would download the models for the users
> > >2) or host the binaries elsewhere
> > >(or require users to download things separately and put them together).
> >
> > I would highly suggest #1 to avoid fragmentation.
> >
> > >
> > >I strongly dislike option 1, so I will focus on option 2 in this
> > >email, as that will be more than enough for one email any way ;)
> >
> > Why don't you like option #1? Just curious.
> 
> Two reasons - a goal is to have an install that is as simple as possible to reduce
> barriers for (very busy) people to give cTAKES a try. (There will be times
> when downloading models of 100s of MB will fail for one reason or another
> on the first attempt.)
> 
> And secondly, the personal experience I've had with writing (commercial)
> install code, which very often turned into a vastly more difficult and time
> consuming (testing-wise) task than people would allow for, and also resulted
> in more enduser questions than anticipated. Which leads to an admittedly
> personal bias against such things, if they can be avoided. But I mentioned #1
> because I know my views on #2 are partially a personal bias.
> 
> > >For people to host such an all-inclusive binary elsewhere, those
> > >people would need to choose a name.
> > >We could create a logo for their use, something like "Apache cTAKES
> > >inside" or  "Powered by Apache cTAKES" (see
> > >http://www.apache.org/foundation/marks/pmcs.html#poweredby) and
> make
> > >it clear the binary is not being released directly by Apache
> > >http://s.apache.org/BAj
> > >
> > >I suggest that we wouldn't need to create a convenience binary here
> > >at Apache - one less thing to test and document.
> > >
> > >This would bring up several questions though, which I'm guessing we
> > >don't want to get into here in great detail since it is really about
> > >something that is not to be released directly from Apache.
> > > - what to call the binary (we would not simply be able to call it
> > >"Apache cTAKES")
> > > - where to host the binary (I'd suggest the ohnlp sourceforge
> > >project, where previous versions of cTAKES live)
> > > - we would need a place to hold the documentation for this binary. I
> > >am assuming we could not host it as apache.org, but we would need
> > >that either confirmed here or create a legal Jira to get that confirmation.
> > > - where would we tell people to go to post questions about the binary?
> > > - where would the build of the binary take place
> > >
> > >I suggest taking those questions offline unless someone tells me
> > >those things are indeed OK to discuss here.
> > >
> > >My main point to discuss here is whether there is enough value in
> > >providing a convenience binary of Apache cTAKES here at apache.org
> > >(which would not contain the models) for us to create and support it
> > >here, or if we skip creating binary here at apache.org and only
> > >create source packages here.
> > >
> > >I am not trying to splinter the group here. I would hope anyone
> > >involved in producing the binary would be involved here with Apache
> > cTAKES too.
> > >But there might be people involved in Apache cTAKES that aren't
> > >interested in the details of how a binary is produced or what it
> > >looks like, or even if it is produced.
> >
> > That's a possibility but brings with a whole horde of other legal
> > mumbo jumbo (and trademarks@) that trust me you don't want to go
> down (yet).
> > Maybe ever :)
> >
> > Try and focus on #1 -- I bet it's achievable without all the
> > convenience binaries part. Would that work for the community?
> 
> We have previously (before Apache) received lots of positive end user
> feedback about what an improvement providing an all-inclusive binary was
> for them.
> Not providing it is a step backward for us.
> 
> > Cheers,
> > Chris
> > >
> > >-- James
> > >
> 
> -- James
> 
> > >> -----Original Message-----
> > >> From:
> > >> general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> > >> [mailto:general-return-39392-
> Masanz.James=mayo.edu@incubator.apache
> > >> .o
> > >> rg]
> > >> On Behalf Of Benson Margulies
> > >> Sent: Thursday, January 24, 2013 9:23 PM
> > >> To: general@incubator.apache.org
> > >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> > >>
> > >> It's unfortunate to have this conversation in parallel here and on
> > >> https://issues.apache.org/jira/browse/LEGAL-157.
> > >>
> > >> Also, this thread is a combo of the discussion of ordinary
> > >>jars-of-classes  (where I'd forgotten the policy) and the much more
> > >>tangled question of  models, which is what the JIRA is wrestling with.
> > >>
> > >> To answer Ted, I think that Roy might write something like:
> > >>
> > >> "It's not the mission of the ASF to create complete,
> > >>end-user-friendly,  software products. It's our mission to create
> > >>open source code. If someone  else wants to build up an
> > >>end-user-friendly aggregation of ASF code and  models from bombs of
> > >>whatever, that's great, and we encourage them."
> > >>
> > >> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <br...@apache.org>
> wrote:
> > >> > On 25.01.2013 01:50, Ted Dunning wrote:
> > >> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <br...@apache.org>
> > >>wrote:
> > >> >>
> > >> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> > >> >>> ...>>
> > >> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> > >> >>>> Well, that clear enough, even if it is a typical example of
> > >> >>>> how our founders yell at us but we have no mechanism to
> > >> >>>> channel those yells into concise, unambiguous, documentation.
> > >> >>> Per haps off-topic ... but I fail to see how "source release"
> > >> >>> is ambiguous or not concise.
> > >> >>>
> > >> >>> Unless the Java world has a different definition of "source code"
> > >> >>> than us stuck-in-the-mud plodders, and it's only considered
> > >> >>> binary once it's been JIT-compiled. :)
> > >> >>>
> > >> >>
> > >> >> It isn't necessarily ambiguous when applied to code, but there
> > >> >> is a different case when applied to models  or parameter settings.
> > >> >>
> > >> >> For instance, commons match has polynomial coefficients embedded
> > >> >> in code that approximate certain functions.  These are the
> > >> >> results of computations done using other systems and the source
> > >> >> code and the data used in those other computations are not
> > >> >> included in the released code, only the parameter values are.
> > >> >>
> > >> >> This same sort of thing applies here except that the model in
> > >> >> question has a much larger set of values and is being packaged
> > >> >> in a binary, inspectable format.  Would your opinion change if
> > >> >> the model were expressed in a textual model?  Would it matter
> > >> >> that the textual model is too large and obtuse to usefully inspect?
> > >> >
> > >> > In cases like this one, it would seem reasonable for the source
> > >> > code to refer to those models and computations, which presumably
> > >> > anyone can then reproduce to their own satisfaction. This is
> > >> > unlike compiled code in that compilation results are notoriously
> > >> > hard to reproduce exactly, because they depend on many factors
> > >> > that are usually hard to document, let alone reproduce. I'd
> > >> > expect a mathematical model, no matter how large, does not suffer
> > >> > from such
> > ambiguities (and shut up, Gödel).
> > >> >
> > >> > However, that's beside the point, because ...
> > >> >
> > >> >> What about a hypothetical case where the model is derived from
> > >> >> the explosion of a nuclear bomb?  Would the release of the
> > >> >> numbers require the inclusion of a suitable bomb design so that
> > >> >> everybody could replicate the derivation?
> > >> >
> > >> > ... the issue is not about the exposing all the knowledge that
> > >> > goes into writing the code, but to expose the code itself so that
> > >> > it can be reviewed for, e.g., back-doors and other security issues.
> > >> > Neither of your examples is relevant.
> > >> >
> > >> > -- Brane
> > >> >
> 


RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
There shouldn't be a technical issue with adding it to the bin dist; pending the results of the LEGAL-157.

It's also available in maven central with auto download in mind.  But, there are still some technical issues such as models inside jars within a dependency jar and lucene reading within a dependency jar; I could not get all of the different eclipse ide plugin's to properly unpack it.

--Pei

> -----Original Message-----
> From: Mattmann, Chris A (388J) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Tuesday, January 29, 2013 7:30 PM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Hey Pei, and James,
> 
> One option might be to just use Maven Antrun or something and add that as
> a last step on the install phase or something, or pre assembly. That way it's
> done to the user "automatically", and it still looks like this happened through
> Maven?
> 
> Just a thought.
> 
> Cheers,
> Chris
> 
> 
> On 1/29/13 7:05 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:
> 
> >James,
> >I removed the models and bins from src from the 3.0.0 branch.  Before
> >we create RC6 (I'll try and do it by this Fri), would you or anyone
> >else mind verifying?
> >There will be one additional step for developers and users:
> >One would need to download and unpack the resources.zip and add it to
> >their classpath.
> >https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-
> 3.
> >0.1.zip
> >
> >-Even though it's uploaded to maven central as well, I think it's
> >easier to ask users to just unpack and add it to their classpath.
> >-Everything is in a single zip (umls,lvg,models) for simplicity.  Users
> >can always just ignore what they do not need.
> >
> >--Pei
> >
> >> -----Original Message-----
> >> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> >> Sent: Saturday, January 26, 2013 1:41 AM
> >> To: ctakes-dev@incubator.apache.org
> >> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> >> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >>
> >>
> >> thanks, that was not clear to me.  It's good news.
> >>
> >> -- James
> >> ________________________________________
> >> From: ctakes-dev-return-1119-
> >> Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-
> 1119-
> >> Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka
> >> Zitting [jukka.zitting@gmail.com]
> >> Sent: Saturday, January 26, 2013 12:37 AM
> >> To: ctakes-dev
> >> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> >> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >>
> >> Hi,
> >>
> >> On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J.
> >> <Ma...@mayo.edu> wrote:
> >> > Yes, that's the way I reading Roy's post - that it can't include
> >> > models (intermediate outputs) because the source for those
> >>intermediate
> >> outputs is not being included.
> >>
> >> Note that this only applies to the *source* release, not to any
> >>pre-compiled  binaries you may want to ship along with a release.
> >>
> >> BR,
> >>
> >> Jukka Zitting


Re: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Pei, and James,

One option might be to just use Maven Antrun or something and add that as
a last step on the install phase or something, or pre assembly. That way
it's done to the user "automatically", and it still looks like this
happened through Maven?

Just a thought.

Cheers,
Chris


On 1/29/13 7:05 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:

>James,
>I removed the models and bins from src from the 3.0.0 branch.  Before we
>create RC6 (I'll try and do it by this Fri), would you or anyone else
>mind verifying?
>There will be one additional step for developers and users:
>One would need to download and unpack the resources.zip and add it to
>their classpath.
>https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.
>0.1.zip
>
>-Even though it's uploaded to maven central as well, I think it's easier
>to ask users to just unpack and add it to their classpath.
>-Everything is in a single zip (umls,lvg,models) for simplicity.  Users
>can always just ignore what they do not need.
>
>--Pei
>
>> -----Original Message-----
>> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
>> Sent: Saturday, January 26, 2013 1:41 AM
>> To: ctakes-dev@incubator.apache.org
>> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
>> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
>> 
>> 
>> thanks, that was not clear to me.  It's good news.
>> 
>> -- James
>> ________________________________________
>> From: ctakes-dev-return-1119-
>> Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1119-
>> Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka Zitting
>> [jukka.zitting@gmail.com]
>> Sent: Saturday, January 26, 2013 12:37 AM
>> To: ctakes-dev
>> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
>> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
>> 
>> Hi,
>> 
>> On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J.
>> <Ma...@mayo.edu> wrote:
>> > Yes, that's the way I reading Roy's post - that it can't include
>> > models (intermediate outputs) because the source for those
>>intermediate
>> outputs is not being included.
>> 
>> Note that this only applies to the *source* release, not to any
>>pre-compiled
>> binaries you may want to ship along with a release.
>> 
>> BR,
>> 
>> Jukka Zitting


RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
FYI:
About to create an RC6 from the 3.0.0 branch.  The main changes are:
- Removed the binaries/models from source and bin dist.  Users will have to download the ctakes-resources.zip from sourceforge for now.
- sub LICENSE/NOTICE consolidated into the root.
- Jars removed from ctakes-*/src/lib.  [A few still exists- but will be removed in the future release as I understand this should not be a show stopper.]

I was planning to call the VOTE in both incubator and dev at the same time.
Please let us know if you have concerns...

--Pei

> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Tuesday, January 29, 2013 8:02 PM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Pei,
> I'll do that after I remove the lib/*.jar files from the 3.0.0 branch.
> -- James
> ________________________________________
> From: ctakes-dev-return-1122-
> Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1122-
> Masanz.James=mayo.edu@incubator.apache.org] on behalf of Chen, Pei
> [Pei.Chen@childrens.harvard.edu]
> Sent: Tuesday, January 29, 2013 6:05 PM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> James,
> I removed the models and bins from src from the 3.0.0 branch.  Before we
> create RC6 (I'll try and do it by this Fri), would you or anyone else mind
> verifying?
> There will be one additional step for developers and users:
> One would need to download and unpack the resources.zip and add it to
> their classpath.
> https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-
> 3.0.1.zip
> 
> -Even though it's uploaded to maven central as well, I think it's easier to ask
> users to just unpack and add it to their classpath.
> -Everything is in a single zip (umls,lvg,models) for simplicity.  Users can always
> just ignore what they do not need.
> 
> --Pei
> 
> > -----Original Message-----
> > From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> > Sent: Saturday, January 26, 2013 1:41 AM
> > To: ctakes-dev@incubator.apache.org
> > Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> >
> > thanks, that was not clear to me.  It's good news.
> >
> > -- James
> > ________________________________________
> > From: ctakes-dev-return-1119-
> > Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1119-
> > Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka
> Zitting
> > [jukka.zitting@gmail.com]
> > Sent: Saturday, January 26, 2013 12:37 AM
> > To: ctakes-dev
> > Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> > Hi,
> >
> > On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J.
> > <Ma...@mayo.edu> wrote:
> > > Yes, that's the way I reading Roy's post - that it can't include
> > > models (intermediate outputs) because the source for those
> > > intermediate
> > outputs is not being included.
> >
> > Note that this only applies to the *source* release, not to any
> > pre-compiled binaries you may want to ship along with a release.
> >
> > BR,
> >
> > Jukka Zitting

RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Masanz, James J." <Ma...@mayo.edu>.
Pei,
I'll do that after I remove the lib/*.jar files from the 3.0.0 branch.
-- James
________________________________________
From: ctakes-dev-return-1122-Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1122-Masanz.James=mayo.edu@incubator.apache.org] on behalf of Chen, Pei [Pei.Chen@childrens.harvard.edu]
Sent: Tuesday, January 29, 2013 6:05 PM
To: ctakes-dev@incubator.apache.org
Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

James,
I removed the models and bins from src from the 3.0.0 branch.  Before we create RC6 (I'll try and do it by this Fri), would you or anyone else mind verifying?
There will be one additional step for developers and users:
One would need to download and unpack the resources.zip and add it to their classpath.
https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.0.1.zip

-Even though it's uploaded to maven central as well, I think it's easier to ask users to just unpack and add it to their classpath.
-Everything is in a single zip (umls,lvg,models) for simplicity.  Users can always just ignore what they do not need.

--Pei

> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Saturday, January 26, 2013 1:41 AM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
>
>
> thanks, that was not clear to me.  It's good news.
>
> -- James
> ________________________________________
> From: ctakes-dev-return-1119-
> Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1119-
> Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka Zitting
> [jukka.zitting@gmail.com]
> Sent: Saturday, January 26, 2013 12:37 AM
> To: ctakes-dev
> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
>
> Hi,
>
> On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J.
> <Ma...@mayo.edu> wrote:
> > Yes, that's the way I reading Roy's post - that it can't include
> > models (intermediate outputs) because the source for those intermediate
> outputs is not being included.
>
> Note that this only applies to the *source* release, not to any pre-compiled
> binaries you may want to ship along with a release.
>
> BR,
>
> Jukka Zitting

RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Branch at: 
http://svn.apache.org/repos/asf/incubator/ctakes/branches/ctakes-3.0.0-incubating/
Resources zip at:
https://sourceforge.net/projects/ctakesresources/files/


> -----Original Message-----
> From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
> Sent: Tuesday, January 29, 2013 7:06 PM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> James,
> I removed the models and bins from src from the 3.0.0 branch.  Before we
> create RC6 (I'll try and do it by this Fri), would you or anyone else mind
> verifying?
> There will be one additional step for developers and users:
> One would need to download and unpack the resources.zip and add it to
> their classpath.
> https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-
> 3.0.1.zip
> 
> -Even though it's uploaded to maven central as well, I think it's easier to ask
> users to just unpack and add it to their classpath.
> -Everything is in a single zip (umls,lvg,models) for simplicity.  Users can always
> just ignore what they do not need.
> 
> --Pei
> 
> > -----Original Message-----
> > From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> > Sent: Saturday, January 26, 2013 1:41 AM
> > To: ctakes-dev@incubator.apache.org
> > Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> >
> > thanks, that was not clear to me.  It's good news.
> >
> > -- James
> > ________________________________________
> > From: ctakes-dev-return-1119-
> > Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1119-
> > Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka
> Zitting
> > [jukka.zitting@gmail.com]
> > Sent: Saturday, January 26, 2013 12:37 AM
> > To: ctakes-dev
> > Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> > [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >
> > Hi,
> >
> > On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J.
> > <Ma...@mayo.edu> wrote:
> > > Yes, that's the way I reading Roy's post - that it can't include
> > > models (intermediate outputs) because the source for those
> > > intermediate
> > outputs is not being included.
> >
> > Note that this only applies to the *source* release, not to any
> > pre-compiled binaries you may want to ship along with a release.
> >
> > BR,
> >
> > Jukka Zitting

RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
James,
I removed the models and bins from src from the 3.0.0 branch.  Before we create RC6 (I'll try and do it by this Fri), would you or anyone else mind verifying?
There will be one additional step for developers and users:
One would need to download and unpack the resources.zip and add it to their classpath.
https://sourceforge.net/projects/ctakesresources/files/ctakes-resources-3.0.1.zip

-Even though it's uploaded to maven central as well, I think it's easier to ask users to just unpack and add it to their classpath.
-Everything is in a single zip (umls,lvg,models) for simplicity.  Users can always just ignore what they do not need.

--Pei

> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Saturday, January 26, 2013 1:41 AM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> 
> thanks, that was not clear to me.  It's good news.
> 
> -- James
> ________________________________________
> From: ctakes-dev-return-1119-
> Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1119-
> Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka Zitting
> [jukka.zitting@gmail.com]
> Sent: Saturday, January 26, 2013 12:37 AM
> To: ctakes-dev
> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Hi,
> 
> On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J.
> <Ma...@mayo.edu> wrote:
> > Yes, that's the way I reading Roy's post - that it can't include
> > models (intermediate outputs) because the source for those intermediate
> outputs is not being included.
> 
> Note that this only applies to the *source* release, not to any pre-compiled
> binaries you may want to ship along with a release.
> 
> BR,
> 
> Jukka Zitting

RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Masanz, James J." <Ma...@mayo.edu>.
thanks, that was not clear to me.  It's good news.

-- James
________________________________________
From: ctakes-dev-return-1119-Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1119-Masanz.James=mayo.edu@incubator.apache.org] on behalf of Jukka Zitting [jukka.zitting@gmail.com]
Sent: Saturday, January 26, 2013 12:37 AM
To: ctakes-dev
Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Hi,

On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J. <Ma...@mayo.edu> wrote:
> Yes, that's the way I reading Roy's post - that it can't include models (intermediate outputs)
> because the source for those intermediate outputs is not being included.

Note that this only applies to the *source* release, not to any
pre-compiled binaries you may want to ship along with a release.

BR,

Jukka Zitting

Re: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Jan 25, 2013 at 5:09 PM, Masanz, James J. <Ma...@mayo.edu> wrote:
> Yes, that's the way I reading Roy's post - that it can't include models (intermediate outputs)
> because the source for those intermediate outputs is not being included.

Note that this only applies to the *source* release, not to any
pre-compiled binaries you may want to ship along with a release.

BR,

Jukka Zitting

RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Bleeker, Troy C." <Bl...@mayo.edu>.
I'd like to weigh-in on the importance of a binary to the user. Based on this statement
> ... just pointing out that this is far 
> from doctrine. Apache OpenOffice is a prime counter example to his 
> point and I just made that point myself.
If there is any way cTAKES can distribute a binary we should work through the legal stuff to do so.

Secondarily, end users also give up easily. If your binary is not co-located with the source typically the documentation is also split and that's the issue. cTAKES  already has two documentation locations, the Apache incubator site and the Confluence doc. Adding another for the binary and another for the binary doc is too much for end users let alone our sanity while the cTAKES community tries to keep it all in sync.

Thanks
Troy
-----Original Message-----
From: ctakes-dev-return-1110-Bleeker.Troy=mayo.edu@incubator.apache.org [mailto:ctakes-dev-return-1110-Bleeker.Troy=mayo.edu@incubator.apache.org] On Behalf Of Masanz, James J.
Sent: Friday, January 25, 2013 9:10 AM
To: 'ctakes-dev@incubator.apache.org'
Subject: RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release


> -----Original Message-----
> From: 
> ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.
> org]
> On Behalf Of Mattmann, Chris A (388J)
> Sent: Friday, January 25, 2013 2:10 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Hey James,
> 
> On 1/24/13 11:55 PM, "Masanz, James J." <Ma...@mayo.edu> wrote:
> 
> >I posted on general@incubator that:
> >
> >> One goal is to have a binary that contains all resources, which can 
> >> be used to install cTAKES on a system that does not have an 
> >> internet connection.
> >> For now we can focus on a first Apache release that doesn't meet 
> >> that goal, while pursuing the question with legal.
> >> If legal says we can't do have that kind of binary here, then in 
> >> the future we can consider if we will host such a binary on a 
> >> different site.
> >
> >http://s.apache.org/bgp
> >
> >Another motivation for this email is a post by Benson (below) to 
> >general@incubator, where he writes "It's not the mission of the ASF 
> >to create complete, end-user-friendly, software products".
> 
> Just to clarify -- that's Benson, talking for Roy. :) I realize that 
> this has got all skitzo lately, but just pointing out that this is far 
> from doctrine. Apache OpenOffice is a prime counter example to his 
> point and I just made that point myself.
> 
> >
> >I suggest we, or whoever among us are interested in such a thing, 
> >host an easy-to-install *binary* that includes cTAKES plus the models 
> >and jars, somewhere other than apache.org, that would be a single 
> >download with a simple unzip (and would be built off Apache cTAKES 
> >3.0.0-incubating, once it is released).
> 
> If it comes to this, I'd recommend hosting it at 
> http://apache-extras.org/ which is Google Code, but branded with 
> Apache through a special ComDev agreement set up. Products developed there are said to have an "affinity"
> towards particular Apache products, but not be those Apache products.
> Apache Extras != Apache, but still is an option for those parts.
> 
> >
> >This binary would probably be released shortly after each Apache 
> >cTAKES release, so it could be built from the officially released 
> >Apache cTAKES source.
> 
> Yep. I don't think the battle is over there yet though -- I liked your 
> suggestion however -- let's just roll a source release, and try to 
> push the convenience binaries as needed.
> 
> >
> >From my understanding, we cannot have models in SVN here if they were 
> >built from data that is not available to the community since the 
> >models are not "source". That's based on this specific comment within LEGAL-157:
> >https://issues.apache.org/jira/browse/LEGAL-157?focusedCommentId=1356
> >10
> >92&
> >page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
> >#c
> >omm
> >ent-13561092
> 
> That's Benson's opinion, note Roy hasn't replied to him. I don't read 
> Roy's reading on the subject to be that we can't include those 
> intermediate outputs? Do you?

Yes, that's the way I reading Roy's post - that it can't include models (intermediate outputs) because the source for those intermediate outputs is not being included.

> >We also cannot have other compiled jars in our SVN here at 
> >apache.org, and therefore cannot be in our source release, which we 
> >are working on addressing
> 
> That's not recommended, but also not an absolute blocker and can be 
> improved incrementally. Prior versions of Apache Lucene (and anything 
> built from Ant) had this issue and those releases shipped just fine.

That's great to know. Thanks.
 
> >
> >For people checking out code from SVN and using maven, those are not 
> >such big issues since maven will fetch the dependencies once we 
> >finish updating the POMs etc.
> >
> >If we want to allow people to download a single binary and get the 
> >cTAKES code and the models, it sounds like we either need to
> >1) write something that would download the models for the users
> >2) or host the binaries elsewhere
> >(or require users to download things separately and put them together).
> 
> I would highly suggest #1 to avoid fragmentation.
> 
> >
> >I strongly dislike option 1, so I will focus on option 2 in this 
> >email, as that will be more than enough for one email any way ;)
> 
> Why don't you like option #1? Just curious.

Two reasons - a goal is to have an install that is as simple as possible to reduce barriers for (very busy) people to give cTAKES a try. (There will be times when downloading models of 100s of MB will fail for one reason or another on the first attempt.)

And secondly, the personal experience I've had with writing (commercial) install code, which very often turned into a vastly more difficult and time consuming (testing-wise) task than people would allow for, and also resulted in more enduser questions than anticipated. Which leads to an admittedly personal bias against such things, if they can be avoided. But I mentioned #1 because I know my views on #2 are partially a personal bias.

> >For people to host such an all-inclusive binary elsewhere, those 
> >people would need to choose a name.
> >We could create a logo for their use, something like "Apache cTAKES 
> >inside" or  "Powered by Apache cTAKES" (see
> >http://www.apache.org/foundation/marks/pmcs.html#poweredby) and make 
> >it clear the binary is not being released directly by Apache 
> >http://s.apache.org/BAj
> >
> >I suggest that we wouldn't need to create a convenience binary here 
> >at Apache - one less thing to test and document.
> >
> >This would bring up several questions though, which I'm guessing we 
> >don't want to get into here in great detail since it is really about 
> >something that is not to be released directly from Apache.
> > - what to call the binary (we would not simply be able to call it 
> >"Apache cTAKES")
> > - where to host the binary (I'd suggest the ohnlp sourceforge 
> >project, where previous versions of cTAKES live)
> > - we would need a place to hold the documentation for this binary. I 
> >am assuming we could not host it as apache.org, but we would need 
> >that either confirmed here or create a legal Jira to get that confirmation.
> > - where would we tell people to go to post questions about the binary?
> > - where would the build of the binary take place
> >
> >I suggest taking those questions offline unless someone tells me 
> >those things are indeed OK to discuss here.
> >
> >My main point to discuss here is whether there is enough value in 
> >providing a convenience binary of Apache cTAKES here at apache.org 
> >(which would not contain the models) for us to create and support it 
> >here, or if we skip creating binary here at apache.org and only 
> >create source packages here.
> >
> >I am not trying to splinter the group here. I would hope anyone 
> >involved in producing the binary would be involved here with Apache
> cTAKES too.
> >But there might be people involved in Apache cTAKES that aren't 
> >interested in the details of how a binary is produced or what it 
> >looks like, or even if it is produced.
> 
> That's a possibility but brings with a whole horde of other legal 
> mumbo jumbo (and trademarks@) that trust me you don't want to go down (yet).
> Maybe ever :)
> 
> Try and focus on #1 -- I bet it's achievable without all the 
> convenience binaries part. Would that work for the community?

We have previously (before Apache) received lots of positive end user feedback about what an improvement providing an all-inclusive binary was for them. 
Not providing it is a step backward for us. 

> Cheers,
> Chris
> >
> >-- James
> >

-- James
 
> >> -----Original Message-----
> >> From: 
> >> general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> >> [mailto:general-return-39392-Masanz.James=mayo.edu@incubator.apache
> >> .o
> >> rg]
> >> On Behalf Of Benson Margulies
> >> Sent: Thursday, January 24, 2013 9:23 PM
> >> To: general@incubator.apache.org
> >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >>
> >> It's unfortunate to have this conversation in parallel here and on 
> >> https://issues.apache.org/jira/browse/LEGAL-157.
> >>
> >> Also, this thread is a combo of the discussion of ordinary 
> >>jars-of-classes  (where I'd forgotten the policy) and the much more 
> >>tangled question of  models, which is what the JIRA is wrestling with.
> >>
> >> To answer Ted, I think that Roy might write something like:
> >>
> >> "It's not the mission of the ASF to create complete, 
> >>end-user-friendly,  software products. It's our mission to create 
> >>open source code. If someone  else wants to build up an 
> >>end-user-friendly aggregation of ASF code and  models from bombs of 
> >>whatever, that's great, and we encourage them."
> >>
> >> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <br...@apache.org> wrote:
> >> > On 25.01.2013 01:50, Ted Dunning wrote:
> >> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <br...@apache.org>
> >>wrote:
> >> >>
> >> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> >> >>> ...>>
> >> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> >> >>>> Well, that clear enough, even if it is a typical example of 
> >> >>>> how our founders yell at us but we have no mechanism to 
> >> >>>> channel those yells into concise, unambiguous, documentation.
> >> >>> Per haps off-topic ... but I fail to see how "source release" 
> >> >>> is ambiguous or not concise.
> >> >>>
> >> >>> Unless the Java world has a different definition of "source code"
> >> >>> than us stuck-in-the-mud plodders, and it's only considered 
> >> >>> binary once it's been JIT-compiled. :)
> >> >>>
> >> >>
> >> >> It isn't necessarily ambiguous when applied to code, but there 
> >> >> is a different case when applied to models  or parameter settings.
> >> >>
> >> >> For instance, commons match has polynomial coefficients embedded 
> >> >> in code that approximate certain functions.  These are the 
> >> >> results of computations done using other systems and the source 
> >> >> code and the data used in those other computations are not 
> >> >> included in the released code, only the parameter values are.
> >> >>
> >> >> This same sort of thing applies here except that the model in 
> >> >> question has a much larger set of values and is being packaged 
> >> >> in a binary, inspectable format.  Would your opinion change if 
> >> >> the model were expressed in a textual model?  Would it matter 
> >> >> that the textual model is too large and obtuse to usefully inspect?
> >> >
> >> > In cases like this one, it would seem reasonable for the source 
> >> > code to refer to those models and computations, which presumably 
> >> > anyone can then reproduce to their own satisfaction. This is 
> >> > unlike compiled code in that compilation results are notoriously 
> >> > hard to reproduce exactly, because they depend on many factors 
> >> > that are usually hard to document, let alone reproduce. I'd 
> >> > expect a mathematical model, no matter how large, does not suffer 
> >> > from such
> ambiguities (and shut up, Gödel).
> >> >
> >> > However, that's beside the point, because ...
> >> >
> >> >> What about a hypothetical case where the model is derived from 
> >> >> the explosion of a nuclear bomb?  Would the release of the 
> >> >> numbers require the inclusion of a suitable bomb design so that 
> >> >> everybody could replicate the derivation?
> >> >
> >> > ... the issue is not about the exposing all the knowledge that 
> >> > goes into writing the code, but to expose the code itself so that 
> >> > it can be reviewed for, e.g., back-doors and other security issues.
> >> > Neither of your examples is relevant.
> >> >
> >> > -- Brane
> >> >



RE: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Masanz, James J." <Ma...@mayo.edu>.
> -----Original Message-----
> From: ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1106-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Mattmann, Chris A (388J)
> Sent: Friday, January 25, 2013 2:10 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: [DISCUSS] no binary release of cTAKES here at Apache? FW:
> [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> 
> Hey James,
> 
> On 1/24/13 11:55 PM, "Masanz, James J." <Ma...@mayo.edu> wrote:
> 
> >I posted on general@incubator that:
> >
> >> One goal is to have a binary that contains all resources, which can
> >> be used to install cTAKES on a system that does not have an internet
> >> connection.
> >> For now we can focus on a first Apache release that doesn't meet that
> >> goal, while pursuing the question with legal.
> >> If legal says we can't do have that kind of binary here, then in the
> >> future we can consider if we will host such a binary on a different
> >> site.
> >
> >http://s.apache.org/bgp
> >
> >Another motivation for this email is a post by Benson (below) to
> >general@incubator, where he writes "It's not the mission of the ASF to
> >create complete, end-user-friendly, software products".
> 
> Just to clarify -- that's Benson, talking for Roy. :) I realize that this
> has got all skitzo lately, but just pointing out that this is far from
> doctrine. Apache OpenOffice is a prime counter example to his point and I
> just made that point myself.
> 
> >
> >I suggest we, or whoever among us are interested in such a thing, host
> >an easy-to-install *binary* that includes cTAKES plus the models and
> >jars, somewhere other than apache.org, that would be a single download
> >with a simple unzip (and would be built off Apache cTAKES
> >3.0.0-incubating, once it is released).
> 
> If it comes to this, I'd recommend hosting it at http://apache-extras.org/
> which is Google Code, but branded with Apache through a special ComDev
> agreement set up. Products developed there are said to have an "affinity"
> towards particular Apache products, but not be those Apache products.
> Apache Extras != Apache, but still is an option for those parts.
> 
> >
> >This binary would probably be released shortly after each Apache cTAKES
> >release, so it could be built from the officially released Apache
> >cTAKES source.
> 
> Yep. I don't think the battle is over there yet though -- I liked your
> suggestion however -- let's just roll a source release, and try to push
> the convenience binaries as needed.
> 
> >
> >From my understanding, we cannot have models in SVN here if they were
> >built from data that is not available to the community since the models
> >are not "source". That's based on this specific comment within LEGAL-157:
> >https://issues.apache.org/jira/browse/LEGAL-157?focusedCommentId=135610
> >92&
> >page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#c
> >omm
> >ent-13561092
> 
> That's Benson's opinion, note Roy hasn't replied to him. I don't read
> Roy's reading on the subject to be that we can't include those
> intermediate outputs? Do you?

Yes, that's the way I reading Roy's post - that it can't include models (intermediate outputs) because the source for those intermediate outputs is not being included.

> >We also cannot have other compiled jars in our SVN here at apache.org,
> >and therefore cannot be in our source release, which we are working on
> >addressing
> 
> That's not recommended, but also not an absolute blocker and can be
> improved incrementally. Prior versions of Apache Lucene (and anything
> built from Ant) had this issue and those releases shipped just fine.

That's great to know. Thanks.
 
> >
> >For people checking out code from SVN and using maven, those are not
> >such big issues since maven will fetch the dependencies once we finish
> >updating the POMs etc.
> >
> >If we want to allow people to download a single binary and get the
> >cTAKES code and the models, it sounds like we either need to
> >1) write something that would download the models for the users
> >2) or host the binaries elsewhere
> >(or require users to download things separately and put them together).
> 
> I would highly suggest #1 to avoid fragmentation.
> 
> >
> >I strongly dislike option 1, so I will focus on option 2 in this email,
> >as that will be more than enough for one email any way ;)
> 
> Why don't you like option #1? Just curious.

Two reasons - a goal is to have an install that is as simple as possible to reduce barriers for (very busy) people to give cTAKES a try. (There will be times when downloading models of 100s of MB will fail for one reason or another on the first attempt.)

And secondly, the personal experience I've had with writing (commercial) install code, which very often turned into a vastly more difficult and time consuming (testing-wise) task than people would allow for, and also resulted in more enduser questions than anticipated. Which leads to an admittedly personal bias against such things, if they can be avoided. But I mentioned #1 because I know my views on #2 are partially a personal bias.

> >For people to host such an all-inclusive binary elsewhere, those people
> >would need to choose a name.
> >We could create a logo for their use, something like "Apache cTAKES
> >inside" or  "Powered by Apache cTAKES" (see
> >http://www.apache.org/foundation/marks/pmcs.html#poweredby) and make it
> >clear the binary is not being released directly by Apache
> >http://s.apache.org/BAj
> >
> >I suggest that we wouldn't need to create a convenience binary here at
> >Apache - one less thing to test and document.
> >
> >This would bring up several questions though, which I'm guessing we
> >don't want to get into here in great detail since it is really about
> >something that is not to be released directly from Apache.
> > - what to call the binary (we would not simply be able to call it
> >"Apache cTAKES")
> > - where to host the binary (I'd suggest the ohnlp sourceforge project,
> >where previous versions of cTAKES live)
> > - we would need a place to hold the documentation for this binary. I
> >am assuming we could not host it as apache.org, but we would need that
> >either confirmed here or create a legal Jira to get that confirmation.
> > - where would we tell people to go to post questions about the binary?
> > - where would the build of the binary take place
> >
> >I suggest taking those questions offline unless someone tells me those
> >things are indeed OK to discuss here.
> >
> >My main point to discuss here is whether there is enough value in
> >providing a convenience binary of Apache cTAKES here at apache.org
> >(which would not contain the models) for us to create and support it
> >here, or if we skip creating binary here at apache.org and only create
> >source packages here.
> >
> >I am not trying to splinter the group here. I would hope anyone
> >involved in producing the binary would be involved here with Apache
> cTAKES too.
> >But there might be people involved in Apache cTAKES that aren't
> >interested in the details of how a binary is produced or what it looks
> >like, or even if it is produced.
> 
> That's a possibility but brings with a whole horde of other legal mumbo
> jumbo (and trademarks@) that trust me you don't want to go down (yet).
> Maybe ever :)
> 
> Try and focus on #1 -- I bet it's achievable without all the convenience
> binaries part. Would that work for the community?

We have previously (before Apache) received lots of positive end user feedback about what an improvement providing an all-inclusive binary was for them. 
Not providing it is a step backward for us. 

> Cheers,
> Chris
> >
> >-- James
> >

-- James
 
> >> -----Original Message-----
> >> From: general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
> >> [mailto:general-return-39392-Masanz.James=mayo.edu@incubator.apache.o
> >> rg]
> >> On Behalf Of Benson Margulies
> >> Sent: Thursday, January 24, 2013 9:23 PM
> >> To: general@incubator.apache.org
> >> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
> >>
> >> It's unfortunate to have this conversation in parallel here and on
> >> https://issues.apache.org/jira/browse/LEGAL-157.
> >>
> >> Also, this thread is a combo of the discussion of ordinary
> >>jars-of-classes  (where I'd forgotten the policy) and the much more
> >>tangled question of  models, which is what the JIRA is wrestling with.
> >>
> >> To answer Ted, I think that Roy might write something like:
> >>
> >> "It's not the mission of the ASF to create complete,
> >>end-user-friendly,  software products. It's our mission to create open
> >>source code. If someone  else wants to build up an end-user-friendly
> >>aggregation of ASF code and  models from bombs of whatever, that's
> >>great, and we encourage them."
> >>
> >> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <br...@apache.org> wrote:
> >> > On 25.01.2013 01:50, Ted Dunning wrote:
> >> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <br...@apache.org>
> >>wrote:
> >> >>
> >> >>> On 21.01.2013 21:08, Benson Margulies wrote:
> >> >>> ...>>
> >> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
> >> >>>> Well, that clear enough, even if it is a typical example of how
> >> >>>> our founders yell at us but we have no mechanism to channel
> >> >>>> those yells into concise, unambiguous, documentation.
> >> >>> Per haps off-topic ... but I fail to see how "source release" is
> >> >>> ambiguous or not concise.
> >> >>>
> >> >>> Unless the Java world has a different definition of "source code"
> >> >>> than us stuck-in-the-mud plodders, and it's only considered
> >> >>> binary once it's been JIT-compiled. :)
> >> >>>
> >> >>
> >> >> It isn't necessarily ambiguous when applied to code, but there is
> >> >> a different case when applied to models  or parameter settings.
> >> >>
> >> >> For instance, commons match has polynomial coefficients embedded
> >> >> in code that approximate certain functions.  These are the results
> >> >> of computations done using other systems and the source code and
> >> >> the data used in those other computations are not included in the
> >> >> released code, only the parameter values are.
> >> >>
> >> >> This same sort of thing applies here except that the model in
> >> >> question has a much larger set of values and is being packaged in
> >> >> a binary, inspectable format.  Would your opinion change if the
> >> >> model were expressed in a textual model?  Would it matter that the
> >> >> textual model is too large and obtuse to usefully inspect?
> >> >
> >> > In cases like this one, it would seem reasonable for the source
> >> > code to refer to those models and computations, which presumably
> >> > anyone can then reproduce to their own satisfaction. This is unlike
> >> > compiled code in that compilation results are notoriously hard to
> >> > reproduce exactly, because they depend on many factors that are
> >> > usually hard to document, let alone reproduce. I'd expect a
> >> > mathematical model, no matter how large, does not suffer from such
> ambiguities (and shut up, Gödel).
> >> >
> >> > However, that's beside the point, because ...
> >> >
> >> >> What about a hypothetical case where the model is derived from the
> >> >> explosion of a nuclear bomb?  Would the release of the numbers
> >> >> require the inclusion of a suitable bomb design so that everybody
> >> >> could replicate the derivation?
> >> >
> >> > ... the issue is not about the exposing all the knowledge that goes
> >> > into writing the code, but to expose the code itself so that it can
> >> > be reviewed for, e.g., back-doors and other security issues.
> >> > Neither of your examples is relevant.
> >> >
> >> > -- Brane
> >> >



Re: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by Jörn Kottmann <ko...@gmail.com>.
On 01/25/2013 09:10 AM, Mattmann, Chris A (388J) wrote:
>> For people checking out code from SVN and using maven, those are not such
>> >big issues since maven will fetch the dependencies once we finish
>> >updating the POMs etc.
>> >
>> >If we want to allow people to download a single binary and get the cTAKES
>> >code and the models, it sounds like we either need to
>> >1) write something that would download the models for the users
>> >2) or host the binaries elsewhere
>> >(or require users to download things separately and put them together).
> I would highly suggest #1 to avoid fragmentation.
>

+1, for option #1, having a download script and maybe a pointer in the 
README
where to get the models from is still user friendly and maybe also makes 
it easier
for people to understand that there can be different models.

Jörn

Re: [DISCUSS] no binary release of cTAKES here at Apache? FW: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey James,

On 1/24/13 11:55 PM, "Masanz, James J." <Ma...@mayo.edu> wrote:

>I posted on general@incubator that:
>
>> One goal is to have a binary that contains all resources,
>> which can be used to install cTAKES on a system that does
>> not have an internet connection.
>> For now we can focus on a first Apache release that
>> doesn't meet that goal, while pursuing the question with legal.
>> If legal says we can't do have that kind of binary here,
>> then in the future we can consider
>> if we will host such a binary on a different site.
>
>http://s.apache.org/bgp
>
>Another motivation for this email is a post by Benson (below) to
>general@incubator, where he writes "It's not the mission of the ASF to
>create complete, end-user-friendly, software products".

Just to clarify -- that's Benson, talking for Roy. :) I realize that this
has got all skitzo lately, but just pointing out that this is far from
doctrine. Apache OpenOffice is a prime counter example to his point and I
just made that point myself.

>
>I suggest we, or whoever among us are interested in such a thing, host an
>easy-to-install *binary* that includes cTAKES plus the models and jars,
>somewhere other than apache.org, that would be a single download with a
>simple unzip (and would be built off Apache cTAKES 3.0.0-incubating, once
>it is released).

If it comes to this, I'd recommend hosting it at http://apache-extras.org/
which is Google Code, but branded with Apache through a special ComDev
agreement set up. Products developed there are said to have an "affinity"
towards particular Apache products, but not be those Apache products.
Apache Extras != Apache, but still is an option for those parts.

>
>This binary would probably be released shortly after each Apache cTAKES
>release, so it could be built from the officially released Apache cTAKES
>source.

Yep. I don't think the battle is over there yet though -- I liked your
suggestion however -- let's just roll a source release, and try to push
the convenience binaries as needed.

>
>From my understanding, we cannot have models in SVN here if they were
>built from data that is not available to the community since the models
>are not "source". That's based on this specific comment within LEGAL-157:
>https://issues.apache.org/jira/browse/LEGAL-157?focusedCommentId=13561092&
>page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comm
>ent-13561092

That's Benson's opinion, note Roy hasn't replied to him. I don't read
Roy's reading on the subject to be that we can't include those
intermediate outputs? Do you?

>
>We also cannot have other compiled jars in our SVN here at apache.org,
>and therefore cannot be in our source release, which we are working on
>addressing

That's not recommended, but also not an absolute blocker and can be
improved incrementally. Prior versions of Apache Lucene (and anything
built from Ant) had this issue and those releases shipped just fine.

>
>For people checking out code from SVN and using maven, those are not such
>big issues since maven will fetch the dependencies once we finish
>updating the POMs etc.
>
>If we want to allow people to download a single binary and get the cTAKES
>code and the models, it sounds like we either need to
>1) write something that would download the models for the users
>2) or host the binaries elsewhere
>(or require users to download things separately and put them together).

I would highly suggest #1 to avoid fragmentation.

>
>I strongly dislike option 1, so I will focus on option 2 in this email,
>as that will be more than enough for one email any way ;)

Why don't you like option #1? Just curious.

>
>For people to host such an all-inclusive binary elsewhere, those people
>would need to choose a name.
>We could create a logo for their use, something like "Apache cTAKES
>inside" or  "Powered by Apache cTAKES" (see
>http://www.apache.org/foundation/marks/pmcs.html#poweredby) and make it
>clear the binary is not being released directly by Apache
>http://s.apache.org/BAj
>
>I suggest that we wouldn't need to create a convenience binary here at
>Apache - one less thing to test and document.
>
>This would bring up several questions though, which I'm guessing we don't
>want to get into here in great detail since it is really about something
>that is not to be released directly from Apache.
> - what to call the binary (we would not simply be able to call it
>"Apache cTAKES")
> - where to host the binary (I'd suggest the ohnlp sourceforge project,
>where previous versions of cTAKES live)
> - we would need a place to hold the documentation for this binary. I am
>assuming we could not host it as apache.org, but we would need that
>either confirmed here or create a legal Jira to get that confirmation.
> - where would we tell people to go to post questions about the binary?
> - where would the build of the binary take place
>
>I suggest taking those questions offline unless someone tells me those
>things are indeed OK to discuss here.
>
>My main point to discuss here is whether there is enough value in
>providing a convenience binary of Apache cTAKES here at apache.org (which
>would not contain the models) for us to create and support it here, or if
>we skip creating binary here at apache.org and only create source
>packages here.
>
>I am not trying to splinter the group here. I would hope anyone involved
>in producing the binary would be involved here with Apache cTAKES too.
>But there might be people involved in Apache cTAKES that aren't
>interested in the details of how a binary is produced or what it looks
>like, or even if it is produced.

That's a possibility but brings with a whole horde of other legal mumbo
jumbo (and trademarks@) that trust me you don't want to go down (yet).
Maybe ever :) 

Try and focus on #1 -- I bet it's achievable without all the convenience
binaries part. Would that work for the community?

Cheers,
Chris

>
>-- James
>
>> -----Original Message-----
>> From: general-return-39392-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:general-return-39392-Masanz.James=mayo.edu@incubator.apache.org]
>> On Behalf Of Benson Margulies
>> Sent: Thursday, January 24, 2013 9:23 PM
>> To: general@incubator.apache.org
>> Subject: Re: [VOTE] Apache cTAKES 3.0.0-incubating RC5 release
>> 
>> It's unfortunate to have this conversation in parallel here and on
>> https://issues.apache.org/jira/browse/LEGAL-157.
>> 
>> Also, this thread is a combo of the discussion of ordinary
>>jars-of-classes
>> (where I'd forgotten the policy) and the much more tangled question of
>> models, which is what the JIRA is wrestling with.
>> 
>> To answer Ted, I think that Roy might write something like:
>> 
>> "It's not the mission of the ASF to create complete, end-user-friendly,
>> software products. It's our mission to create open source code. If
>>someone
>> else wants to build up an end-user-friendly aggregation of ASF code and
>> models from bombs of whatever, that's great, and we encourage them."
>> 
>> On Thu, Jan 24, 2013 at 8:19 PM, Branko Čibej <br...@apache.org> wrote:
>> > On 25.01.2013 01:50, Ted Dunning wrote:
>> >> On Fri, Jan 25, 2013 at 7:37 AM, Branko Čibej <br...@apache.org>
>>wrote:
>> >>
>> >>> On 21.01.2013 21:08, Benson Margulies wrote:
>> >>> ...>>
>> >>>>> I am referring to this discussion  http://s.apache.org/MUZ
>> >>>> Well, that clear enough, even if it is a typical example of how our
>> >>>> founders yell at us but we have no mechanism to channel those yells
>> >>>> into concise, unambiguous, documentation.
>> >>> Per haps off-topic ... but I fail to see how "source release" is
>> >>> ambiguous or not concise.
>> >>>
>> >>> Unless the Java world has a different definition of "source code"
>> >>> than us stuck-in-the-mud plodders, and it's only considered binary
>> >>> once it's been JIT-compiled. :)
>> >>>
>> >>
>> >> It isn't necessarily ambiguous when applied to code, but there is a
>> >> different case when applied to models  or parameter settings.
>> >>
>> >> For instance, commons match has polynomial coefficients embedded in
>> >> code that approximate certain functions.  These are the results of
>> >> computations done using other systems and the source code and the
>> >> data used in those other computations are not included in the
>> >> released code, only the parameter values are.
>> >>
>> >> This same sort of thing applies here except that the model in
>> >> question has a much larger set of values and is being packaged in a
>> >> binary, inspectable format.  Would your opinion change if the model
>> >> were expressed in a textual model?  Would it matter that the textual
>> >> model is too large and obtuse to usefully inspect?
>> >
>> > In cases like this one, it would seem reasonable for the source code
>> > to refer to those models and computations, which presumably anyone can
>> > then reproduce to their own satisfaction. This is unlike compiled code
>> > in that compilation results are notoriously hard to reproduce exactly,
>> > because they depend on many factors that are usually hard to document,
>> > let alone reproduce. I'd expect a mathematical model, no matter how
>> > large, does not suffer from such ambiguities (and shut up, Gödel).
>> >
>> > However, that's beside the point, because ...
>> >
>> >> What about a hypothetical case where the model is derived from the
>> >> explosion of a nuclear bomb?  Would the release of the numbers
>> >> require the inclusion of a suitable bomb design so that everybody
>> >> could replicate the derivation?
>> >
>> > ... the issue is not about the exposing all the knowledge that goes
>> > into writing the code, but to expose the code itself so that it can be
>> > reviewed for, e.g., back-doors and other security issues. Neither of
>> > your examples is relevant.
>> >
>> > -- Brane
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>