You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@harmony.apache.org by "Geir Magnusson Jr." <ge...@apache.org> on 2005/11/14 06:11:57 UTC

[legal] Proposed changes for the Bulk Contributor Questionnaire

As promised, here are the proposed changes to our Bulk Contributor  
Questionnaire.  The purpose here is to enable contributions for which  
ACQs are not available for all authors of the contribution.  This  
could be for many legitimate reasons, and we should do everything to  
liberalize our contribution process where appropriate.

Some ideas behind the changes below :

0) We want to enable the acceptance of code created in the past.

1) We want to make people *think* about what they are contributing.

2) We want to make them *examine* their contribution, and do it in  
ways that we think will help them think about the provenance.

3) We don't want to provide a loophole for contributors such that  
code can be created in parallel with Harmony w/o our strict rules  
about ACQ-ed contributors.

4) This is subjective - we still have our human intuition to rely on,  
and can reject contributions if things don't 'feel' right.


I have a list of keywords to propose for the "keyword scan" question,  
so that we can enable people to do a better job of examining and  
thinking about what they are contributing.  Further, I visited the  
offices of BlackDuck Software last week to get a feel for their  
product, and talk to them about how they can work with us, and we  
with them.  It was a good visit - I really got a good first  
impression - and I will work to help them engage with us here if I  
can. :)  [ My goal with them is to get a copy of their software  
working on our infrastructure so we - the project - can use it to  
scan contributions as well as continually scan our ongoing work...]

Anyway, below is my proposal for changing the BCC.  I have the old  
version there as well for comparison.

Comments welcome.

geir



-----------



         The Apache Software Foundation
            Apache Harmony Project
         Bulk Contribution Checklist
                v 1.0 20051114

The Apache Harmony project is dedicated to producing a codebase that
has clear IP pedigree and protects the IP rights of others.  As part
of this effort, we ask the following questions of all contributions
of software that has been created outside of the project.  Our goal is
to provide clear and consistent oversight of the project codebase, as
well as encourage our contributors to carefully examine their
contributions before bringing to the project.

Please Note : This document and your answers are considered public  
information,
and shall be part of the Apache Harmony project public records.


Part I :  Identification

    Please provide the following information

       Name : ___________________________________________
     E-mail : ___________________________________________

     Mailing address :
         ___________________________________________
         ___________________________________________
         ___________________________________________
         ___________________________________________

      Employer :  ___________________________________________


Part II : Description

     Please describe the contribution :







<old_part_III>
Part III :  Statement of Origination

      Have you personally written all of the code or other material
      that you are intending to contribute to this project?

       [ ] Yes    [ ] No

      If not, you need to satisfy both a) and b) below.

      a)  All of the other authors are Authorized Contributors for  
the component.
          Please list the other authors :





      b)  You have a written agreement with those who wrote the material
          that either gives you ownership of the material or otherwise
          provides you sufficient rights to submit this material to the
          project on their behalf. Please provide the details of this  
agreement:


</old_part_III>

<new_part_III>

Part III :  Statement of Origination

a) Have you personally written all of the code or other material
    that you are intending to contribute to this project, and if so,
    are you an Authorized Contributor for all parts of the contribution?

   [ ] Yes
   [ ] No

   If "yes", you're done with Part III, skip to Part IV
   If "no" please continue with the rest of Part III

b) Have you verified the development history of the code to
    identify ALL of the authors?

    Please list the other authors:


c) Do you have a written agreement with all of the authors that
    either gives you ownership of the material or otherwise provides
    you sufficient rights to submit this material to the project
    on their behalf.

    Please provide the details of this agreement:


d) Are all of the authors Authorized Contributors for the part of
    the contribution written/created by each author?

   [ ] Yes – if "yes", you're done with Part III, skip to Part IV.
   [ ] No – if "no", please continue with the rest of part III


e) Was the code written prior to May 2005 (when the Harmony Project
    was initiated)?

   [ ] Yes
   [ ] No

   (i)  If No, you must provide Authorized Contributor Questionnaires
        for the authors of the code created after May 2005 such that
        those authors  are classified as Authorized Contributors for
        the portions of the contribution  written by them
        after May 2005.

f) Did any of the authors of the code have access to third
    party implementations of similar technology while developing the
    contribution?

   [ ] Yes
   [ ] No

   If "yes", please give details below :




g) Was the code developed in accordance with a  development
    process which was designed to prevent unauthorized inclusion
    of third party  intellectual property rights into the code?
    (e.g., does the process require that developers not have
    concurrent access to third party implementations of similar
    technology during development?)

   [ ] Yes
   [ ] No

   If "no", the code isn't eligible for the Harmony Project.

   If "yes", please provide short description of the process,
   focusing on protections related to third party intellectual
   property :





h)  Did you follow the directions at http://harmony.apache.org/ 
keyword_scan
     (a scan for keywords that will help identify code pedigree) and  
review
     the results?  Did your review confirm the history of the code?

     [  ]  Yes
     [  ]  No

     If "no", please explain.



Note : The Apache Harmony project generally performs additional
scans of bulk contributions to help confirm code pedigree.  For
example, the contribution may be compared against known proprietary
implementations of similar technology using a service such as that
offered by Black Duck or XXXXXXXXXX.  Prior to submitting the  
contribution,
we strongly encourage you to use one of the many  third-party services
available to verify that the contribution will be acceptable.




Part IV : Checklist

   [ ] Contribution is licensed under the Apache License v2.0

   [ ] Software Grant or Corporate Contributor License Agreement and  
Software
       Grant executed and submitted


  Signature : ___________________________________________
Print Name : ___________________________________________
       Date : ___________________________________________



v1.1 20051114


-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Leo Simons <ma...@leosimons.com>.

On Tue, Nov 15, 2005 at 02:58:55PM +0000, Zoe Slattery wrote:
> I like the idea of Apache owning the IP scanning tools. It's easy to write 
> keyword scanners (not much more complicated than grep). I have 
> a few lines of perl that do basic keyword scanning - I'd be happy to put 
> these in JIRA if it would be useful. 

I just love how perl solves so many of the world's problems. Yes please. Make
sure to fill out the relevant questionnaire ;-)

LSD

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Zoe Slattery <zo...@uk.ibm.com>.

Zoë Slattery
IBM

Tim Ellison <t....@gmail.com> wrote on 15/11/2005 11:53:44:

> Geir Magnusson Jr. wrote:
> > I'm sorry, but I don't understand the issue here.  I'm proposing that
> > 
> > a) We suggest to people that are about to contribute to us to do some 
> > careful inspection before they do that.  The assumption here is that 
> > people are well-meaning but sometimes makes mistakes or are lazy, and 
> > we want them to think before the contribute.  A keyword scanner (which
> > is a glorified "grep") is a great way to find things that you  weren't
> > aware were there, such as who authors were (if there are  author 
tags),
> > what copyright claims are listed in the files, etc.    There's nothing
> > inherently evil about it.  It doesn't matter what SCO  or anyone else
> > did with a keyword scanner - we're trying to have it  used to protect
> > ourselves and just as importantly, other copyright  holders like Sun.
> 
> The keyword scan would be another tool in the Harmony IP-cleanliness
> toolkit, alongside the Contributor Questionnaire and Bulk Contribution
> Policy.  I'd like to see such a tool used not only on incoming bulk
> contributions but also used regularly on the day-to-day developed code
> base in svn.

I like the idea of Apache owning the IP scanning tools. It's easy to write 

keyword scanners (not much more complicated than grep). I have 
a few lines of perl that do basic keyword scanning - I'd be happy to put 
these 
in JIRA if it would be useful. 

> 
> Such tools and processes will never be perfect, and can only provide
> assistance with limited aspects (copyright/trademark) of the
> IP-cleanliness goal; however, it does set the tone for the project --
> that we care about such things for the Harmony code, and that we respect
> the IP rights of code outside Harmony to not be misappropriated into
> Harmony.
> 
> That said, I agree with Leo that naming BlackDuck as the provider of
> such cleanliness checks limits the Bulk Contribution Policy in a manner
> that is unneccessary.  The PPMC should be in a position to decide
> whether the actual checks performed by a contributor are sufficient or
> whether they think further checks are required.
> 
> > b) We use a tool internally to check code for which the contributor 
> > can't provide our ASQ for each author.  Ok, the tool isn't open 
source,
> > but I don't know of any options, and we need something like  this
> > *now*.  I'd love to see us create a toolsuite like this (because  one 
of
> > my goals is to work out a process that we can share with the  rest of
> > the ASF....), but we don't have the luxury of time to do it.
> 
> I have no experience of using BlackDuck, and no reason to believe they
> are anything other than a fine bunch of people.  IMHO we will be more
> successful by informing people of the risks and adopting good working
> practices rather than looking for the biggest stick to hit offenders (I
> know that you are not advocating that approach!).
> 
> So my constructive suggestion is to keep the extra questions in the
> questionnaire, but remove the single sentence:
>   "For example, the contribution may be compared against known
>    proprietary implementations of similar technology using a
>    service such as that offered by Black Duck or XXXXXXXXXX."
> 
> maybe replacing it with a reference to current best practice.
> 
> 
> Regards,
> Tim
> 
> 
> -- 
> 
> Tim Ellison (t.p.ellison@gmail.com)
> IBM Java technology centre, UK.

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

On Nov 15, 2005, at 6:53 AM, Tim Ellison wrote:

> Geir Magnusson Jr. wrote:
>
>> I'm sorry, but I don't understand the issue here.  I'm proposing that
>>
>> a) We suggest to people that are about to contribute to us to do some
>> careful inspection before they do that.  The assumption here is that
>> people are well-meaning but sometimes makes mistakes or are lazy, and
>> we want them to think before the contribute.  A keyword scanner   
>> (which
>> is a glorified "grep") is a great way to find things that you   
>> weren't
>> aware were there, such as who authors were (if there are  author  
>> tags),
>> what copyright claims are listed in the files, etc.    There's  
>> nothing
>> inherently evil about it.  It doesn't matter what SCO  or anyone else
>> did with a keyword scanner - we're trying to have it  used to protect
>> ourselves and just as importantly, other copyright  holders like Sun.
>>
>
> The keyword scan would be another tool in the Harmony IP-cleanliness
> toolkit, alongside the Contributor Questionnaire and Bulk Contribution
> Policy.  I'd like to see such a tool used not only on incoming bulk
> contributions but also used regularly on the day-to-day developed code
> base in svn.

Exactly.

>
> Such tools and processes will never be perfect, and can only provide
> assistance with limited aspects (copyright/trademark) of the
> IP-cleanliness goal; however, it does set the tone for the project --
> that we care about such things for the Harmony code, and that we  
> respect
> the IP rights of code outside Harmony to not be misappropriated into
> Harmony.
>
> That said, I agree with Leo that naming BlackDuck as the provider of
> such cleanliness checks limits the Bulk Contribution Policy in a  
> manner
> that is unneccessary.  The PPMC should be in a position to decide
> whether the actual checks performed by a contributor are sufficient or
> whether they think further checks are required.

We used the phrase "such as" to give people the idea.  We don't want  
to endorse or promote any such technology or company as part of our  
governance process (of course), so it was never meant that we'd have  
specific endorsements in our guidelines for contributors.  The  
phrasing as is was to illustrate and trigger discussion.

However, the key issue is what we do in the project.  I think that we  
should have a baseline set of checks though, as that makes our IP  
pedigree that much simpler and cleaner....

>
>
>> b) We use a tool internally to check code for which the contributor
>> can't provide our ASQ for each author.  Ok, the tool isn't open   
>> source,
>> but I don't know of any options, and we need something like  this
>> *now*.  I'd love to see us create a toolsuite like this (because   
>> one of
>> my goals is to work out a process that we can share with the  rest of
>> the ASF....), but we don't have the luxury of time to do it.
>>
>
> I have no experience of using BlackDuck, and no reason to believe they
> are anything other than a fine bunch of people.  IMHO we will be more
> successful by informing people of the risks and adopting good working
> practices rather than looking for the biggest stick to hit  
> offenders (I
> know that you are not advocating that approach!).
>
> So my constructive suggestion is to keep the extra questions in the
> questionnaire, but remove the single sentence:
>   "For example, the contribution may be compared against known
>    proprietary implementations of similar technology using a
>    service such as that offered by Black Duck or XXXXXXXXXX."
>
> maybe replacing it with a reference to current best practice.
>

Yep

geir

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Tim Ellison <t....@gmail.com>.

Geir Magnusson Jr. wrote:
> I'm sorry, but I don't understand the issue here.  I'm proposing that
> 
> a) We suggest to people that are about to contribute to us to do some 
> careful inspection before they do that.  The assumption here is that 
> people are well-meaning but sometimes makes mistakes or are lazy, and 
> we want them to think before the contribute.  A keyword scanner  (which
> is a glorified "grep") is a great way to find things that you  weren't
> aware were there, such as who authors were (if there are  author tags),
> what copyright claims are listed in the files, etc.    There's nothing
> inherently evil about it.  It doesn't matter what SCO  or anyone else
> did with a keyword scanner - we're trying to have it  used to protect
> ourselves and just as importantly, other copyright  holders like Sun.

The keyword scan would be another tool in the Harmony IP-cleanliness
toolkit, alongside the Contributor Questionnaire and Bulk Contribution
Policy.  I'd like to see such a tool used not only on incoming bulk
contributions but also used regularly on the day-to-day developed code
base in svn.

Such tools and processes will never be perfect, and can only provide
assistance with limited aspects (copyright/trademark) of the
IP-cleanliness goal; however, it does set the tone for the project --
that we care about such things for the Harmony code, and that we respect
the IP rights of code outside Harmony to not be misappropriated into
Harmony.

That said, I agree with Leo that naming BlackDuck as the provider of
such cleanliness checks limits the Bulk Contribution Policy in a manner
that is unneccessary.  The PPMC should be in a position to decide
whether the actual checks performed by a contributor are sufficient or
whether they think further checks are required.

> b) We use a tool internally to check code for which the contributor 
> can't provide our ASQ for each author.  Ok, the tool isn't open  source,
> but I don't know of any options, and we need something like  this
> *now*.  I'd love to see us create a toolsuite like this (because  one of
> my goals is to work out a process that we can share with the  rest of
> the ASF....), but we don't have the luxury of time to do it.

I have no experience of using BlackDuck, and no reason to believe they
are anything other than a fine bunch of people.  IMHO we will be more
successful by informing people of the risks and adopting good working
practices rather than looking for the biggest stick to hit offenders (I
know that you are not advocating that approach!).

So my constructive suggestion is to keep the extra questions in the
questionnaire, but remove the single sentence:
  "For example, the contribution may be compared against known
   proprietary implementations of similar technology using a
   service such as that offered by Black Duck or XXXXXXXXXX."

maybe replacing it with a reference to current best practice.

Regards,
Tim

-- 

Tim Ellison (t.p.ellison@gmail.com)
IBM Java technology centre, UK.

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

On Nov 14, 2005, at 9:57 AM, Stefano Mazzocchi wrote:

> Leo Simons wrote:
>
>> Rant below. Decided not to tone it down.
>> On Mon, Nov 14, 2005 at 12:11:57AM -0500, Geir Magnusson Jr. wrote:
>>
>>> Comments welcome.
>>>
>> I like everything but the references to "Black Duck Software". I took
>> a look at their website and their licensing policies and everything
>> about it "feels" wrong. I don't like basing a big part of our  
>> processes
>> on some commercial black box "service-like" offering.
>> Taking another look around the web for similar companies, they  
>> seem to
>> be about "open source risk management" where the risk is to avoid
>> "contaminating" propietary stuff with "open source" stuff. I  
>> resent the
>> idea of "open source" being "contaminating" or anything like that  
>> (GPL
>> is viral, but most other stuff is not). There's this entire  
>> category of
>> companies who capitalize on FUD. I can imagine SCO having stock  
>> options
>> on some of 'em.
>> I think we should avoid the ASF being seen as being part of any of  
>> that.
>> ---
>> Leading Open Source Foundation Does Not Trust Its Own Processes
>> The ASF has recently started using the same tools that intellectual
>> property sharks use when figuring out whom to send cease and desist
>> letters.
>> When asked for comments, the ASF said: "We finally gave up trying to
>> understand why people are so scared of open source, so now we're just
>> using some incomprehensible piece of commercial software which  
>> makes us
>> feel secure. We think its pretty silly, but if we already have run  
>> the
>> tools, at least companies like SCO can't really use them as  
>> grounds for
>> suing us since we'll look pretty clean when they run the tool."
>> Darl McBride said: "We think the ASF is making a very smart decision
>> by employing code scanning techniques. Its the only way to be safe  
>> from
>> prosecution. Of course, most other open source organisations don't
>> employ code scanning techniques (since they do have a brain of their
>> own) so we're just going to sue all of those."
>> IP firm XXX said: "What Darl said. Don't use any of that scary open
>> source stuff. Even the ASF understands that now. Won't be long before
>> they turn into a commercial entity themselves!"
>> ---
>> Grrrrr.
>> Hmm. Didn't SCO run keyword scanners and the like? Didn't they  
>> find out
>> that they'd actually taken code from open source codebases? Didn't  
>> much
>> of the same happen at JBoss some time ago?
>> I doubt there's a lot of keyword scanning tools or any kind of other
>> automated technology that I wouldn't be able to circumvent with a few
>> hours of work. Its just such a stupid idea. If I take source code  
>> from
>> (say) the sun jdk, work on it for a few weeks to make it look  
>> completely
>> different so no line of the original code remains, I still have a
>> derivative work but no scanner is going to be able to detect that.  
>> Just
>> like spam still manages to make it into my inbox.
>> I can imagine how some people or companies would feel safe if we were
>> to say "we scanned everything using this intellectual property risk
>> management tool XXX" but we'd be legitimizing something silly and  
>> giving a
>> false sense of security.
>> Now, if these tools were open source and I'd be able to take a  
>> look at
>> how they work I might put some trust in them. But fancy websites,  
>> lots
>> of press releases, not a lot of technical details, anal usage
>> restrictions and total lack of a "download" button just sets off a  
>> lot
>> of alarm bells.
>> With my infra@ hat on I'd probably be against running this kind of
>> black box software under this kind of policy on ASF hardware. With
>> something like jira, I at least know how it works (or doesn't  
>> work) and what
>> technology is under the cover and can get at the source code if I  
>> want to.
>>
>
> Leo++

I'm sorry, but I don't understand the issue here.  I'm proposing that

a) We suggest to people that are about to contribute to us to do some  
careful inspection before they do that.  The assumption here is that  
people are well-meaning but sometimes makes mistakes or are lazy, and  
we want them to think before the contribute.  A keyword scanner  
(which is a glorified "grep") is a great way to find things that you  
weren't aware were there, such as who authors were (if there are  
author tags), what copyright claims are listed in the files, etc.    
There's nothing inherently evil about it.  It doesn't matter what SCO  
or anyone else did with a keyword scanner - we're trying to have it  
used to protect ourselves and just as importantly, other copyright  
holders like Sun.

b) We use a tool internally to check code for which the contributor  
can't provide our ASQ for each author.  Ok, the tool isn't open  
source, but I don't know of any options, and we need something like  
this *now*.  I'd love to see us create a toolsuite like this (because  
one of my goals is to work out a process that we can share with the  
rest of the ASF....), but we don't have the luxury of time to do it.

geir

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Tim Ellison <t....@gmail.com>.

Geir Magnusson Jr. wrote:
> We need to have the code we compare
> against accessible by someone in the community willing to look at  it. 
> We have people that don't care if they glimpse JRL code (and by  the way
> things are working out, Sun won't care if people are exposed  to JRL
> code as long as they don't make copies...)

Good point -- today we (quite rightly) have an ACQ that 'limits'
contributions based on certain types of prior access; I'm pleased that
you have described how people with prior access to a component can still
assist the project by helping out in this way.

Regards,
Tim

-- 

Tim Ellison (t.p.ellison@gmail.com)
IBM Java technology centre, UK.

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Dalibor Topic <ro...@kaffe.org>.

Geir Magnusson Jr. wrote:
> 
> On Nov 15, 2005, at 8:20 PM, Dalibor Topic wrote:

>> Stuff beyound that would probably go beyound uncovering simple
>> accidents, and would require quite a bit of cooperation from Sun to
>> disclose the pedigree of their implementation's code and equivalent
>> copperation from the contributors.
>>
> 
> Why?  The source is available under the JRL.

But not its history, and that's the interesting part about code
pedigree: where does this particular line come from in code base X and
do they come from a common source and if so, was that OK?

I don't know if one can find out the pedigree of JRLd code (or
Microsoft's implementation, or IBM's closed source stuff, any other
closed implementation we'd like to avoid having enter our code base by
accident) without extensive (and presumably expensive) cooperation from
people who own the code.

>> How do we determine for sure who wrote what when, and who copied what
>> from whom, if that was OK then, and if the contributor has the  right to
>> contribute his changes? In case of conflicting opinions, what do we  do?
>>  Or even worse, if code comes from a now defunct and dead open source
>> project from 1997 [1], with noone around any more, the web site and
>> archives wiped out, what do we do? :)
>>
> 
> I presume that the answer isn't "stick head in sand".
> 

No. Have a process, etc.:)

> Let me ask you this - if the above software did exist and it made it 
> into Harmony's SVN,  would you prefer that
> 
> a) we knew about it and could explain the decision to include it
> 
> b) We were surprised at some future date
> 

I'd prefer a all the time over b. But I don't think it's possible to
avoid b, no matter how advanced keyword scanning (or other) tools we
use. Mistakes are bound to happen and to slip through, we're dealing
with humans.

> I agree that it *can* get very complicated in the hypotheticals, but 
> I'd bet that the majority of what we'd find - if we'd even find 
> anything at all - would be due simple misunderstandings and  mistakes. 

Yeah, I wouldn't expect anything other than that, either.

> I'd sleep better knowing that we at least tried.  One of  our best
> defenses in the event something went wrong would be a  demonstrable,
> good faith effort to do reasonable oversight.

Sure. I've misintepreted your original post to put a lot more faith into
tools than I'd be confortable with, and your subsequent posts have made
it clearer what the rationale is, and how it is supposed to work out.
Thaks for taking your time to do clarify it.

cheers,
dalibor topic

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

On Nov 15, 2005, at 8:20 PM, Dalibor Topic wrote:

> Geir Magnusson Jr. wrote:
>
>> You get a list of files.  You can go check them.  Is how those   
>> matches
>> were done significant?  Can you tell me the algorithm your  head  
>> uses? :)
>>
>
> Well, we could simply throw a dice, if how the matches were done is  
> not
> significant. :)

If the algorithm is bad, we don't find anything or lots of false  
positives.  In either case we stop using the tool.

If the algorithm is good, we know what it would find if there's  
anything to find, so we can use it.

In either case, we can (I believe) do a reasonable job of figuring it  
out even if you don't have the source.

(I never had the source to Windows NT, but I knew it wasn't "good")

>
>
>> Ah - yes.  That's they key.  We would only compare against code  
>> that  we
>> were comfortable having someone look at.  Specifically, I'm  
>> afraid  of
>> Sun code accidentally getting into our codebase, because the  
>> stuff  is
>> so prevalent in the Java community.  It's in every Sun J2SE   
>> distro....
>>
>
> That should be easy enough: just grep for "confidential J2SE software
> from Sun, play nice and play fair!", or whatever the copyright headers
> on such Sun software say.
>
> Stuff beyound that would probably go beyound uncovering simple
> accidents, and would require quite a bit of cooperation from Sun to
> disclose the pedigree of their implementation's code and equivalent
> copperation from the contributors.
>

Why?  The source is available under the JRL.

> Let me give you another scenario:
>
> Purely hypothetically speaking, Sun's implementation may include third
> party software, and changes to such software. Contributions from  
> others
> may include the same (open source, for the sake of argument) software
> and similar changes in order to meet specific, common goals defined by
> common specs.
>

For example, Sun's code includes apache code :)

> How do we determine for sure who wrote what when, and who copied what
> from whom, if that was OK then, and if the contributor has the  
> right to
> contribute his changes? In case of conflicting opinions, what do we  
> do?
>  Or even worse, if code comes from a now defunct and dead open source
> project from 1997 [1], with noone around any more, the web site and
> archives wiped out, what do we do? :)
>

I presume that the answer isn't "stick head in sand".

Let me ask you this - if the above software did exist and it made it  
into Harmony's SVN,  would you prefer that

a) we knew about it and could explain the decision to include it

b) We were surprised at some future date


> I guess the point I'm trying to get across is that the best we can do
> with our resources are very simple, almost trivial checks like  
> checking
> if the copyright headers are sane.

I believe we can do better than that.

>
> Anything beyond that gets very, very joyously complicated very  
> quickly,
> without permanent active assistance from everyone, including the
> copyright holders of the proprietary implementations. Whether  
> copyright
> holders of proprietary implementations would be pleased to dedicate
> resources for Harmony's potential regular inquiries about their code's
> pedigree, I don't know. I'm not sure a pull model scales well in this
> case. :)

I agree that it *can* get very complicated in the hypotheticals, but  
I'd bet that the majority of what we'd find - if we'd even find  
anything at all - would be due simple misunderstandings and  
mistakes.  I'd sleep better knowing that we at least tried.  One of  
our best defenses in the event something went wrong would be a  
demonstrable, good faith effort to do reasonable oversight.

geir

>
> cheers,
> dalibor topic
>
> [1] Ueber-hypothetically, a BSD-ish licensed fork of Kaffe from back
> then. It used to have a BSD-ish license, back in the days, and it got
> forked quite a bit, afaict from the mailing list archives. All of that
> was before my days, and quite a few of those forks are ... resting.:)
>

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Dalibor Topic <ro...@kaffe.org>.

Geir Magnusson Jr. wrote:
> You get a list of files.  You can go check them.  Is how those  matches
> were done significant?  Can you tell me the algorithm your  head uses? :)

Well, we could simply throw a dice, if how the matches were done is not
significant. :)

> Ah - yes.  That's they key.  We would only compare against code that  we
> were comfortable having someone look at.  Specifically, I'm afraid  of
> Sun code accidentally getting into our codebase, because the stuff  is
> so prevalent in the Java community.  It's in every Sun J2SE  distro....

That should be easy enough: just grep for "confidential J2SE software
from Sun, play nice and play fair!", or whatever the copyright headers
on such Sun software say.

Stuff beyound that would probably go beyound uncovering simple
accidents, and would require quite a bit of cooperation from Sun to
disclose the pedigree of their implementation's code and equivalent
copperation from the contributors.

Let me give you another scenario:

Purely hypothetically speaking, Sun's implementation may include third
party software, and changes to such software. Contributions from others
may include the same (open source, for the sake of argument) software
and similar changes in order to meet specific, common goals defined by
common specs.

How do we determine for sure who wrote what when, and who copied what
from whom, if that was OK then, and if the contributor has the right to
contribute his changes? In case of conflicting opinions, what do we do?
 Or even worse, if code comes from a now defunct and dead open source
project from 1997 [1], with noone around any more, the web site and
archives wiped out, what do we do? :)

I guess the point I'm trying to get across is that the best we can do
with our resources are very simple, almost trivial checks like checking
if the copyright headers are sane.

Anything beyond that gets very, very joyously complicated very quickly,
without permanent active assistance from everyone, including the
copyright holders of the proprietary implementations. Whether copyright
holders of proprietary implementations would be pleased to dedicate
resources for Harmony's potential regular inquiries about their code's
pedigree, I don't know. I'm not sure a pull model scales well in this
case. :)

cheers,
dalibor topic

[1] Ueber-hypothetically, a BSD-ish licensed fork of Kaffe from back
then. It used to have a BSD-ish license, back in the days, and it got
forked quite a bit, afaict from the mailing list archives. All of that
was before my days, and quite a few of those forks are ... resting.:)

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

On Nov 15, 2005, at 4:28 PM, Dalibor Topic wrote:

> Geir Magnusson Jr. wrote:
>
>>
>> On Nov 14, 2005, at 3:18 PM, Dalibor Topic wrote:
>>
>>
>>> On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:
>>>
>>>
>>>> Leo Simons wrote:
>>>>
>>>>
>>>>> Rant below. Decided not to tone it down.
>>>>>
>>>>
>>>>
>>>> Leo++
>>>>
>>>>
>>>
>>> +1 from me, too. sounds like an excellent way to shoot oneself to
>>> slashdot with headlines like "Apache foundation rejects code from  
>>> IBM,
>>> claims it was stolen from FSF!". Political suicide, should it ever
>>> happen, as it'd force the ASF to play arbiter in disputes that don't
>>> exist.
>>>
>>
>>
>> I don't understand this.  I'm suggesting we use a tool internally to
>> help us *find* problems, both at contribution time as well as ongoing
>> to ensure that inappropriate 3rd party code doesn't come in  
>> during  the
>> regular flow of activity.  We'd then examine any issues raised,  and
>> make a judgement based on that.
>>
>
> OK. I'm uncomfortable delegating such a potentially sensitive issue  
> to a
> proprietary black box, as in the worst case that leaves us with little
> chance to explore why the black box oracle came up with a wrong or  
> right
> analysis.

I'm confused as I don't understand how you are thinking of this.

First, we mention Blackduck as an example of tools that we might use  
in a specific case of contribution, suggesting that contributors do a  
similar thing before contributing if they choose.  There is no  
requirement.

Second, there's no analysis from BD and it's ilk, no "thumbs up" or  
"thumbs down" - it's simply "these files seem to be like those files"  
and we humans than go look and judge.

We're not turning over any decision making to anyone.

>
> Checking code pedigree makes sense. It just needs to be transparent.

You get a list of files.  You can go check them.  Is how those  
matches were done significant?  Can you tell me the algorithm your  
head uses? :)

>
>
>> Suppose a contribution had code from the FSF. (IBMs doesn't.    
>> Period)
>>
>
> Yeah, I didn't mean to imply it had, just as an ugly worst case
> scenario. I can come up with an even worse one, actually, in which a
> hypothetical IBM contribution had traces of Microsoft's VM code.
> Microsoft should be scarier than the FSF to most people on this  
> list, I
> guess, as the FSF has an interest in working together with us, whereas
> Microsoft's interests probably aren't aligned with open source J2SE.

I think that's a safe assumption :)

>
>
>> Would you prefer that we don't find it until much later,  like  
>> after a
>> release?  Or if we do find it, just accept it to avoid  having to  
>> commit
>> "political suicide" by pointing it out to the  contributor?
>>
>
> It'd be fine as long as nothing bad is found, or the cases flagged by
> the black box oracle are actual issues. I'm trying to view it from a
> worst-case perspective.

We can determine them, because the "oracle" is a really fancy grep,  
which just shows files that have similarity.  We then have to verify.

>
> The trouble would start if we end up having a false positive.
>
> How do we figure out that we have a false positive, without either
> access to say, the database, the source code of the oracle, the  
> complete
> legal history of some bit of proprietary code including the merges,
> transactions, copyright transfers and relicensing operations, etc?

Ah - yes.  That's they key.  We would only compare against code that  
we were comfortable having someone look at.  Specifically, I'm afraid  
of Sun code accidentally getting into our codebase, because the stuff  
is so prevalent in the Java community.  It's in every Sun J2SE  
distro....


>
> Such a 'discovery' process could take quite a bit of time, provided  
> all
> parties involved (including the makers of the black box oracle) would
> have any business interest in participating (in absence of an actual
> legal case). If, say, Microsoft takes their time to talk to Apache  
> about
> the legal history of Microsoft's VM, (what'd be in it for Microsoft,
> after all? :) where does it leave a contribution that'd be flagged as
> potentially infringing on Microsoft's code?
>
> I'd guesstimate a resolution could take a few years, as a worst  
> case. Is
> any contribution that stays in limbo for a few years going to be
> relevant after a claim is showed to be false after a few years?
>
> That's where the 'political suicide' scenario I mentioned comes in, as
> it could force us to act as an arbiter in determining how trustworthy
> either IBM, Black Duck or Microsoft are, based on little more than a
> black box. Not a position I'd like to find myself in, in particular if
> it all turns out to be just a software glitch.[1] :)

I see.  I think that there are some assumptions here that you made,  
that I wasn't ever thinking of.  We need to have the code we compare  
against accessible by someone in the community willing to look at  
it.  We have people that don't care if they glimpse JRL code (and by  
the way things are working out, Sun won't care if people are exposed  
to JRL code as long as they don't make copies...)

So that's the kind of things we want to compare against : open source  
(kaffe, GNU classpath, etc) and code like Sun's for which there are  
no limits on retention after exposure.

>
>
>> If we find code stolen from *any* copyright holder, we will   
>> definitely
>> reject the code.
>>
>
> +1
>
>
>> Because there is a complete  implementation under a
>> non-opensource license that has been very,  very widely  
>> distributed, it
>> behooves us to take what steps we can to  ensure that we don't
>> accidentally incorporate it into our codebase.
>>
>
> +1, too.
>
> We just need to make sure that the steps we take are equally  
> transparent
> to everyone involved (and the outsiders), as the rest of the  
> process is,
> in my opinion. A black box oracle doesn't have its place in such a  
> process.
>

Agreed.

> cheers,
> dalibor topic
>
> [1] Yeah, I know, I'm assuming that the Black Duck software is not
> perfect and error free without having ever seen it. It's a worst case
> scenario, though, so I am taking some freedoms with things that can go
> wrong. :)
>


Freedom (TM)

:)

geir

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Dalibor Topic <ro...@kaffe.org>.

Geir Magnusson Jr. wrote:
> 
> On Nov 14, 2005, at 3:18 PM, Dalibor Topic wrote:
> 
>> On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:
>>
>>> Leo Simons wrote:
>>>
>>>> Rant below. Decided not to tone it down.
>>>
>>>
>>> Leo++
>>>
>>
>> +1 from me, too. sounds like an excellent way to shoot oneself to
>> slashdot with headlines like "Apache foundation rejects code from IBM,
>> claims it was stolen from FSF!". Political suicide, should it ever
>> happen, as it'd force the ASF to play arbiter in disputes that don't
>> exist.
> 
> 
> I don't understand this.  I'm suggesting we use a tool internally to 
> help us *find* problems, both at contribution time as well as ongoing 
> to ensure that inappropriate 3rd party code doesn't come in during  the
> regular flow of activity.  We'd then examine any issues raised,  and
> make a judgement based on that.

OK. I'm uncomfortable delegating such a potentially sensitive issue to a
proprietary black box, as in the worst case that leaves us with little
chance to explore why the black box oracle came up with a wrong or right
analysis.

Checking code pedigree makes sense. It just needs to be transparent.

> Suppose a contribution had code from the FSF. (IBMs doesn't.   Period) 

Yeah, I didn't mean to imply it had, just as an ugly worst case
scenario. I can come up with an even worse one, actually, in which a
hypothetical IBM contribution had traces of Microsoft's VM code.
Microsoft should be scarier than the FSF to most people on this list, I
guess, as the FSF has an interest in working together with us, whereas
Microsoft's interests probably aren't aligned with open source J2SE.

> Would you prefer that we don't find it until much later,  like after a
> release?  Or if we do find it, just accept it to avoid  having to commit
> "political suicide" by pointing it out to the  contributor?

It'd be fine as long as nothing bad is found, or the cases flagged by
the black box oracle are actual issues. I'm trying to view it from a
worst-case perspective.

The trouble would start if we end up having a false positive.

How do we figure out that we have a false positive, without either
access to say, the database, the source code of the oracle, the complete
legal history of some bit of proprietary code including the merges,
transactions, copyright transfers and relicensing operations, etc?

Such a 'discovery' process could take quite a bit of time, provided all
parties involved (including the makers of the black box oracle) would
have any business interest in participating (in absence of an actual
legal case). If, say, Microsoft takes their time to talk to Apache about
the legal history of Microsoft's VM, (what'd be in it for Microsoft,
after all? :) where does it leave a contribution that'd be flagged as
potentially infringing on Microsoft's code?

I'd guesstimate a resolution could take a few years, as a worst case. Is
any contribution that stays in limbo for a few years going to be
relevant after a claim is showed to be false after a few years?

That's where the 'political suicide' scenario I mentioned comes in, as
it could force us to act as an arbiter in determining how trustworthy
either IBM, Black Duck or Microsoft are, based on little more than a
black box. Not a position I'd like to find myself in, in particular if
it all turns out to be just a software glitch.[1] :)

> If we find code stolen from *any* copyright holder, we will  definitely
> reject the code.   

+1

> Because there is a complete  implementation under a
> non-opensource license that has been very,  very widely distributed, it
> behooves us to take what steps we can to  ensure that we don't
> accidentally incorporate it into our codebase.

+1, too.

We just need to make sure that the steps we take are equally transparent
to everyone involved (and the outsiders), as the rest of the process is,
in my opinion. A black box oracle doesn't have its place in such a process.

cheers,
dalibor topic

[1] Yeah, I know, I'm assuming that the Black Duck software is not
perfect and error free without having ever seen it. It's a worst case
scenario, though, so I am taking some freedoms with things that can go
wrong. :)

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

On Nov 14, 2005, at 3:18 PM, Dalibor Topic wrote:

> On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:
>
>> Leo Simons wrote:
>>
>>> Rant below. Decided not to tone it down.
>>
>> Leo++
>>
>
> +1 from me, too. sounds like an excellent way to shoot oneself to
> slashdot with headlines like "Apache foundation rejects code from IBM,
> claims it was stolen from FSF!". Political suicide, should it ever
> happen, as it'd force the ASF to play arbiter in disputes that don't
> exist.

I don't understand this.  I'm suggesting we use a tool internally to  
help us *find* problems, both at contribution time as well as ongoing  
to ensure that inappropriate 3rd party code doesn't come in during  
the regular flow of activity.  We'd then examine any issues raised,  
and make a judgement based on that.

Suppose a contribution had code from the FSF. (IBMs doesn't.   
Period)  Would you prefer that we don't find it until much later,  
like after a release?  Or if we do find it, just accept it to avoid  
having to commit "political suicide" by pointing it out to the  
contributor?

If we find code stolen from *any* copyright holder, we will  
definitely reject the code.    Because there is a complete  
implementation under a non-opensource license that has been very,  
very widely distributed, it behooves us to take what steps we can to  
ensure that we don't accidentally incorporate it into our codebase.

geir

(I love it when I can use "behoove" in a sentence.  Not as good as  
"festoon" or "huggermuggery", but close...)

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Dalibor Topic <ro...@kaffe.org>.

On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:
> Leo Simons wrote:
> >Rant below. Decided not to tone it down.
> >
> >On Mon, Nov 14, 2005 at 12:11:57AM -0500, Geir Magnusson Jr. wrote:
> >>Comments welcome.
> >
> >I like everything but the references to "Black Duck Software". I took
> >a look at their website and their licensing policies and everything
> >about it "feels" wrong. I don't like basing a big part of our processes
> >on some commercial black box "service-like" offering.
> >
> >Taking another look around the web for similar companies, they seem to
> >be about "open source risk management" where the risk is to avoid
> >"contaminating" propietary stuff with "open source" stuff. I resent the
> >idea of "open source" being "contaminating" or anything like that (GPL
> >is viral, but most other stuff is not). There's this entire category of
> >companies who capitalize on FUD. I can imagine SCO having stock options
> >on some of 'em.
> >
> >I think we should avoid the ASF being seen as being part of any of that.
> >
> >---
> >Leading Open Source Foundation Does Not Trust Its Own Processes
> >
> >The ASF has recently started using the same tools that intellectual
> >property sharks use when figuring out whom to send cease and desist
> >letters.
> >
> >When asked for comments, the ASF said: "We finally gave up trying to
> >understand why people are so scared of open source, so now we're just
> >using some incomprehensible piece of commercial software which makes us
> >feel secure. We think its pretty silly, but if we already have run the
> >tools, at least companies like SCO can't really use them as grounds for
> >suing us since we'll look pretty clean when they run the tool."
> >
> >Darl McBride said: "We think the ASF is making a very smart decision
> >by employing code scanning techniques. Its the only way to be safe from
> >prosecution. Of course, most other open source organisations don't
> >employ code scanning techniques (since they do have a brain of their
> >own) so we're just going to sue all of those."
> >
> >IP firm XXX said: "What Darl said. Don't use any of that scary open
> >source stuff. Even the ASF understands that now. Won't be long before
> >they turn into a commercial entity themselves!"
> >---
> >
> >Grrrrr.
> >
> >Hmm. Didn't SCO run keyword scanners and the like? Didn't they find out
> >that they'd actually taken code from open source codebases? Didn't much
> >of the same happen at JBoss some time ago?
> >
> >I doubt there's a lot of keyword scanning tools or any kind of other
> >automated technology that I wouldn't be able to circumvent with a few
> >hours of work. Its just such a stupid idea. If I take source code from
> >(say) the sun jdk, work on it for a few weeks to make it look completely
> >different so no line of the original code remains, I still have a
> >derivative work but no scanner is going to be able to detect that. Just
> >like spam still manages to make it into my inbox.
> >
> >I can imagine how some people or companies would feel safe if we were
> >to say "we scanned everything using this intellectual property risk
> >management tool XXX" but we'd be legitimizing something silly and giving a
> >false sense of security.
> >
> >Now, if these tools were open source and I'd be able to take a look at
> >how they work I might put some trust in them. But fancy websites, lots
> >of press releases, not a lot of technical details, anal usage
> >restrictions and total lack of a "download" button just sets off a lot
> >of alarm bells.
> >
> >With my infra@ hat on I'd probably be against running this kind of
> >black box software under this kind of policy on ASF hardware. With
> >something like jira, I at least know how it works (or doesn't work) and 
> >what
> >technology is under the cover and can get at the source code if I want to.
> 
> Leo++

+1 from me, too. sounds like an excellent way to shoot oneself to
slashdot with headlines like "Apache foundation rejects code from IBM,
claims it was stolen from FSF!". Political suicide, should it ever
happen, as it'd force the ASF to play arbiter in disputes that don't
exist.

cheers,
dalibor topic


> 
> -- 
> Stefano.
>

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Stefano Mazzocchi <st...@apache.org>.

Leo Simons wrote:
> Rant below. Decided not to tone it down.
> 
> On Mon, Nov 14, 2005 at 12:11:57AM -0500, Geir Magnusson Jr. wrote:
>> Comments welcome.
> 
> I like everything but the references to "Black Duck Software". I took
> a look at their website and their licensing policies and everything
> about it "feels" wrong. I don't like basing a big part of our processes
> on some commercial black box "service-like" offering.
> 
> Taking another look around the web for similar companies, they seem to
> be about "open source risk management" where the risk is to avoid
> "contaminating" propietary stuff with "open source" stuff. I resent the
> idea of "open source" being "contaminating" or anything like that (GPL
> is viral, but most other stuff is not). There's this entire category of
> companies who capitalize on FUD. I can imagine SCO having stock options
> on some of 'em.
> 
> I think we should avoid the ASF being seen as being part of any of that.
> 
> ---
> Leading Open Source Foundation Does Not Trust Its Own Processes
> 
> The ASF has recently started using the same tools that intellectual
> property sharks use when figuring out whom to send cease and desist
> letters.
> 
> When asked for comments, the ASF said: "We finally gave up trying to
> understand why people are so scared of open source, so now we're just
> using some incomprehensible piece of commercial software which makes us
> feel secure. We think its pretty silly, but if we already have run the
> tools, at least companies like SCO can't really use them as grounds for
> suing us since we'll look pretty clean when they run the tool."
> 
> Darl McBride said: "We think the ASF is making a very smart decision
> by employing code scanning techniques. Its the only way to be safe from
> prosecution. Of course, most other open source organisations don't
> employ code scanning techniques (since they do have a brain of their
> own) so we're just going to sue all of those."
> 
> IP firm XXX said: "What Darl said. Don't use any of that scary open
> source stuff. Even the ASF understands that now. Won't be long before
> they turn into a commercial entity themselves!"
> ---
> 
> Grrrrr.
> 
> Hmm. Didn't SCO run keyword scanners and the like? Didn't they find out
> that they'd actually taken code from open source codebases? Didn't much
> of the same happen at JBoss some time ago?
> 
> I doubt there's a lot of keyword scanning tools or any kind of other
> automated technology that I wouldn't be able to circumvent with a few
> hours of work. Its just such a stupid idea. If I take source code from
> (say) the sun jdk, work on it for a few weeks to make it look completely
> different so no line of the original code remains, I still have a
> derivative work but no scanner is going to be able to detect that. Just
> like spam still manages to make it into my inbox.
> 
> I can imagine how some people or companies would feel safe if we were
> to say "we scanned everything using this intellectual property risk
> management tool XXX" but we'd be legitimizing something silly and giving a
> false sense of security.
> 
> Now, if these tools were open source and I'd be able to take a look at
> how they work I might put some trust in them. But fancy websites, lots
> of press releases, not a lot of technical details, anal usage
> restrictions and total lack of a "download" button just sets off a lot
> of alarm bells.
> 
> With my infra@ hat on I'd probably be against running this kind of
> black box software under this kind of policy on ASF hardware. With
> something like jira, I at least know how it works (or doesn't work) and what
> technology is under the cover and can get at the source code if I want to.

Leo++

-- 
Stefano.

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Leo Simons <ma...@leosimons.com>.

On Mon, Nov 14, 2005 at 01:51:36AM -0800, Leo Simons wrote:
> Rant below. Decided not to tone it down.

Oh that's a nice examplary attitude Leo. Go and behave just a little
will you?

I spent some more time thinking about this and soul searching and I
talked to Geir for a little bit to get more of an idea of what is
actually and what is actually not the end of the world as we know it
[1].

> On Mon, Nov 14, 2005 at 12:11:57AM -0500, Geir Magnusson Jr. wrote:
> > Comments welcome.
> 
> I like everything but the references to "Black Duck Software". I took
> a look at their website and their licensing policies and everything
> about it "feels" wrong. I don't like basing a big part of our processes
> on some commercial black box "service-like" offering.

Apologies to Black Duck for taking some cheap shots at 'em but I'll stick
to the black box bit. And my dislike of fancy marketing stuff in place of
technical facts.

Anyway...

Lets turn this around. The key with harmony is to be as open and as
transparent about anything and everything as humanly possible, and preferably
just a little more than that. If someone says, "yo people, I wrote this code
and its all mine and lets use it" then that's that. If someone says "we have
this code at our company which we've worked on for 5 years but the details of
what constitutes 'we' and 'this' is a bit different from what you guys
expect", then we say, "err, sure, that's okay too, let's just all take a good
look. Here's tools that might help with that".

Tools are a good thing. Getting more people using grep on a daily basis
seems to be a good thing, too (lets not have a grep vs spotlight debate). Fear
of tools or lack of understanding of tools is the bad thing, and basing
processes on those tools is worse.

> Leading Open Source Foundation Does Not Trust Its Own Processes
<snip/>

I think I wrote down all of my own FUD about this rather well :-). Luckily
the way to dissolve these fears also seems easy enough:

> Now, if these tools were open source and I'd be able to take a look at
> how they work I might put some trust in them.

Perhaps I'm suffering from a bad case of "Not Invented Here" syndrome, but a
headline like

  Open Source Code Analysis Tools Proves Open Source Is Not A Risk At All

  The Apache Software Foundation recently started offering a new source
  code analysis tool which can be useful in detecting the origins of
  software. "Our codebases have always been real shiny and clean and we
  have now developed some tools that prove this point. Writing and running
  some automated software is a lot cheaper than lawsuits!", one Apache
  zealot said. "Besides, we know grep way better than friggin' SCO!"

is not inconceivable either.

Everything looks so grim on mondays, doesn't it?

LSD

[1] -- http://www.astro.washington.edu/endsofworld/

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

On Nov 14, 2005, at 4:51 AM, Leo Simons wrote:

> Rant below. Decided not to tone it down.

That's our Leo :)

>
> On Mon, Nov 14, 2005 at 12:11:57AM -0500, Geir Magnusson Jr. wrote:
>
>> Comments welcome.
>>
>
> I like everything but the references to "Black Duck Software". I took
> a look at their website and their licensing policies and everything
> about it "feels" wrong. I don't like basing a big part of our  
> processes
> on some commercial black box "service-like" offering.

Clearly this is something we'll want to talk about.  The key is to  
give people an indication that we're serious about this and will be  
using tools to help us along.

>
> Taking another look around the web for similar companies, they seem to
> be about "open source risk management" where the risk is to avoid
> "contaminating" propietary stuff with "open source" stuff. I resent  
> the
> idea of "open source" being "contaminating" or anything like that (GPL
> is viral, but most other stuff is not). There's this entire  
> category of
> companies who capitalize on FUD. I can imagine SCO having stock  
> options
> on some of 'em.

Well, I have a different view, but that's because I spent some time  
with them trying to understand.  I think what they originally set out  
to do is to provide risk management by letting you know what's  
happening in your codebases wrt license mingling.  If a developer  
mistakenly brings GPL-ed software into your product codebase and you  
distribute it, you have a big problem, right?

None of that is intrinsically evil or a comment on OSS - developers  
don't understand the nuances of OSS licensing, and this is bound to  
happen.  They also do it for proprietary codebases that they have  
access to (Sun's Java code for example) and you can load in your own.

Now, I'm not defending BD in any way here.  I was really interested  
in how we could use their technology to help us respect the rights of  
other IP holders, as well as ensure that what we accept and create is  
ok.

Another thing - they are really interested in working with OSS  
communities.  I have a note to go to infra@ about this which I'll  
post later today if I can get the time.

>
> I think we should avoid the ASF being seen as being part of any of  
> that.
>
> ---
> Leading Open Source Foundation Does Not Trust Its Own Processes
>
> The ASF has recently started using the same tools that intellectual
> property sharks use when figuring out whom to send cease and desist
> letters.

LOL

>
> When asked for comments, the ASF said: "We finally gave up trying to
> understand why people are so scared of open source, so now we're just
> using some incomprehensible piece of commercial software which  
> makes us
> feel secure. We think its pretty silly, but if we already have run the
> tools, at least companies like SCO can't really use them as grounds  
> for
> suing us since we'll look pretty clean when they run the tool."
>
> Darl McBride said: "We think the ASF is making a very smart decision
> by employing code scanning techniques. Its the only way to be safe  
> from
> prosecution. Of course, most other open source organisations don't
> employ code scanning techniques (since they do have a brain of their
> own) so we're just going to sue all of those."
>
> IP firm XXX said: "What Darl said. Don't use any of that scary open
> source stuff. Even the ASF understands that now. Won't be long before
> they turn into a commercial entity themselves!"
> ---
>
> Grrrrr.
>
> Hmm. Didn't SCO run keyword scanners and the like? Didn't they find  
> out
> that they'd actually taken code from open source codebases? Didn't  
> much
> of the same happen at JBoss some time ago?

There have been many times when keyword scanners have informed us of  
code that had accidentally snuck into our codebase.  There's nothing  
intrinsically wrong with using tooling to find code that shouldn't be  
there.

The point of mentioning a keyword scanner (e.g. grep -R ....) is to  
get people to look at the code, and do some basic due diligence.

>
> I doubt there's a lot of keyword scanning tools or any kind of other
> automated technology that I wouldn't be able to circumvent with a few
> hours of work. Its just such a stupid idea. If I take source code from
> (say) the sun jdk, work on it for a few weeks to make it look  
> completely
> different so no line of the original code remains, I still have a
> derivative work but no scanner is going to be able to detect that.  
> Just
> like spam still manages to make it into my inbox.

Right.  We are *never* secure from the efforts of a bad actor.   
Ever.  People can lie on their ICLA, their CCLA, the software grant.   
They can change the copyright, license and munge the code around a bit.

We're not trying to stop that - we're trying to stop accidents, and  
create a very clean developer base.

So they keyword scanner is for people to use on their code before  
contribution, to make them look at the list and ensure that what they  
find doesn't surprise them.

I ran a keyword scanner on the IBM contribution and found "Sun".  :)   
That made me want to look, and there is some code that (c) Sun that  
is included, and can legally be.  But it just made me go look,  
because there are lots of bits of Sun code that can't be included.

So again - the scanner is to just get people to think and do some  
work, rather than just tossing code over the fence at us.

>
> I can imagine how some people or companies would feel safe if we were
> to say "we scanned everything using this intellectual property risk
> management tool XXX" but we'd be legitimizing something silly and  
> giving a
> false sense of security.

Ah - this is for our purpose, IMO.  Any company that is significantly  
worried about this will do their own examination of anything we  
produce.  We're not trying to make a warranty claim about our  
software, but just do whatever we can to help ensure that stuff we  
don't want doesn't get in via bulk contributions, and our day to day  
efforts working on the codebase don't allow things to accidentally  
slip in either.

>
> Now, if these tools were open source and I'd be able to take a look at
> how they work I might put some trust in them. But fancy websites, lots
> of press releases, not a lot of technical details, anal usage
> restrictions and total lack of a "download" button just sets off a lot
> of alarm bells.

Yes, well.... that's one of the reasons I decided to go say "howdy"  
to them.  This is just the beginning.  If this doesn't work for us,  
it won't work for us.  But it's worth looking into because they do  
some pretty nice and interesting things.  That's the subject for  
another post, though.

>
> With my infra@ hat on I'd probably be against running this kind of
> black box software under this kind of policy on ASF hardware. With
> something like jira, I at least know how it works (or doesn't work)  
> and what
> technology is under the cover and can get at the source code if I  
> want to.

Understood.

geir

>
> - LSD
>
>

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

Re: [legal] Proposed changes for the Bulk Contributor Questionnaire

Posted by Leo Simons <ma...@leosimons.com>.

Rant below. Decided not to tone it down.

On Mon, Nov 14, 2005 at 12:11:57AM -0500, Geir Magnusson Jr. wrote:
> Comments welcome.

I like everything but the references to "Black Duck Software". I took
a look at their website and their licensing policies and everything
about it "feels" wrong. I don't like basing a big part of our processes
on some commercial black box "service-like" offering.

Taking another look around the web for similar companies, they seem to
be about "open source risk management" where the risk is to avoid
"contaminating" propietary stuff with "open source" stuff. I resent the
idea of "open source" being "contaminating" or anything like that (GPL
is viral, but most other stuff is not). There's this entire category of
companies who capitalize on FUD. I can imagine SCO having stock options
on some of 'em.

I think we should avoid the ASF being seen as being part of any of that.

---
Leading Open Source Foundation Does Not Trust Its Own Processes

The ASF has recently started using the same tools that intellectual
property sharks use when figuring out whom to send cease and desist
letters.

When asked for comments, the ASF said: "We finally gave up trying to
understand why people are so scared of open source, so now we're just
using some incomprehensible piece of commercial software which makes us
feel secure. We think its pretty silly, but if we already have run the
tools, at least companies like SCO can't really use them as grounds for
suing us since we'll look pretty clean when they run the tool."

Darl McBride said: "We think the ASF is making a very smart decision
by employing code scanning techniques. Its the only way to be safe from
prosecution. Of course, most other open source organisations don't
employ code scanning techniques (since they do have a brain of their
own) so we're just going to sue all of those."

IP firm XXX said: "What Darl said. Don't use any of that scary open
source stuff. Even the ASF understands that now. Won't be long before
they turn into a commercial entity themselves!"
---

Grrrrr.

Hmm. Didn't SCO run keyword scanners and the like? Didn't they find out
that they'd actually taken code from open source codebases? Didn't much
of the same happen at JBoss some time ago?

I doubt there's a lot of keyword scanning tools or any kind of other
automated technology that I wouldn't be able to circumvent with a few
hours of work. Its just such a stupid idea. If I take source code from
(say) the sun jdk, work on it for a few weeks to make it look completely
different so no line of the original code remains, I still have a
derivative work but no scanner is going to be able to detect that. Just
like spam still manages to make it into my inbox.

I can imagine how some people or companies would feel safe if we were
to say "we scanned everything using this intellectual property risk
management tool XXX" but we'd be legitimizing something silly and giving a
false sense of security.

Now, if these tools were open source and I'd be able to take a look at
how they work I might put some trust in them. But fancy websites, lots
of press releases, not a lot of technical details, anal usage
restrictions and total lack of a "download" button just sets off a lot
of alarm bells.

With my infra@ hat on I'd probably be against running this kind of
black box software under this kind of policy on ASF hardware. With
something like jira, I at least know how it works (or doesn't work) and what
technology is under the cover and can get at the source code if I want to.

- LSD

Re: [legal] Take II (Re: [legal] Proposed changes for the Bulk Contributor Questionnaire)

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

I'm going to post this to the site since we seem to be comfortable,  
and any misconceptions due to my lack of clarity seem to be taken  
care of.

Please bellow if there is anything else...

geir

On Nov 15, 2005, at 5:37 PM, Geir Magnusson Jr. wrote:

>
> Ok - we gotten some good feedback on this.   I'd like to emphasize  
> that these changes really are for what I hope are corner cases,  
> because any contribution for which the contributor has all the ACQs  
> (like the 3 we have already), these new questions aren't asked.
>
> The following is the same proposal with the keyword scan question  
> removed (I'll move that to  the contribution guidelines page), the  
> suggestion pointing out BlackDuck by name also changed (maybe we  
> offer a list of entities that could help potential contributors on  
> our website?).
>
> Also, I saw Zoe's note offering her keyword scanner software  
> (although I missed the note because of our mail problems), and I'm  
> all for it.  Zoe, if you can just put it in a JIRA, that would be  
> great.  If there's anyone interested in working on this kind of  
> software for our use (and the ASF use), step up and we can make a  
> little subproject out of it...
>
> -----
>
>
>         The Apache Software Foundation
>            Apache Harmony Project
>         Bulk Contribution Checklist
>                v 1.1 20051114
>
> The Apache Harmony project is dedicated to producing a codebase that
> has clear IP pedigree and protects the IP rights of others.  As part
> of this effort, we ask the following questions of all contributions
> of software that has been created outside of the project.  Our goal is
> to provide clear and consistent oversight of the project codebase, as
> well as encourage our contributors to carefully examine their
> contributions before bringing to the project.
>
> Please Note : This document and your answers are considered public
> information, and shall be part of the Apache Harmony project public
> records.
>
>
> Part I :  Identification
>
>    Please provide the following information
>
>       Name : ___________________________________________
>     E-mail : ___________________________________________
>
>     Mailing address :
>         ___________________________________________
>         ___________________________________________
>         ___________________________________________
>         ___________________________________________
>
>      Employer :  ___________________________________________
>
>
> Part II : Description
>
>     Please describe the contribution :
>
>
>
>
>
> Part III :  Statement of Origination
>
> a) Have you personally written all of the code or other material
>    that you are intending to contribute to this project, and if so,
>    are you an Authorized Contributor for all parts of the  
> contribution?
>
>   [ ] Yes
>   [ ] No
>
>   If "yes", you're done with Part III, skip to Part IV
>   If "no" please continue with the rest of Part III
>
> b) Have you verified the development history of the code to
>    identify ALL of the authors?
>
>    Please list the other authors:
>
>
> c) Do you have a written agreement with all of the authors that
>    either gives you ownership of the material or otherwise provides
>    you sufficient rights to submit this material to the project
>    on their behalf.
>
>    Please provide the details of this agreement:
>
>
> d) Are all of the authors Authorized Contributors for the part of
>    the contribution written/created by each author?
>
>   [ ] Yes – if "yes", you're done with Part III, skip to Part IV.
>   [ ] No – if "no", please continue with the rest of part III
>
>
> e) Was the code written prior to May 2005 (when the Harmony Project
>    was initiated)?
>
>   [ ] Yes
>   [ ] No
>
>   (i)  If No, you must provide Authorized Contributor Questionnaires
>        for the authors of the code created after May 2005 such that
>        those authors  are classified as Authorized Contributors for
>        the portions of the contribution  written by them
>        after May 2005.
>
> f) Did any of the authors of the code have access to third
>    party implementations of similar technology while developing the
>    contribution?
>
>   [ ] Yes
>   [ ] No
>
>   If "yes", please give details below :
>
>
>
>
> g) Was the code developed in accordance with a  development
>    process which was designed to prevent unauthorized inclusion
>    of third party  intellectual property rights into the code?
>    (e.g., does the process require that developers not have
>    concurrent access to third party implementations of similar
>    technology during development?)
>
>   [ ] Yes
>   [ ] No
>
>   If "no", the code isn't eligible for the Harmony Project.
>
>   If "yes", please provide short description of the process,
>   focusing on protections related to third party intellectual
>   property :
>
>
>
>
> Note : The Apache Harmony project generally performs additional
> scans of it's codebase, including bulk contributions, to help
> confirm code pedigree.  Prior to submitting any contribution,
> we strongly encourage you to verify that the contribution is
> acceptable.  Please see http://<URL> for more information.
>
>
>
>
> Part IV : Checklist
>
>   [ ] Contribution is licensed under the Apache License v2.0
>
>   [ ] Software Grant or Corporate Contributor License Agreement and  
> Software
>       Grant executed and submitted
>
>
>  Signature : ___________________________________________
> Print Name : ___________________________________________
>       Date : ___________________________________________
>
>
>
> v1.0  20051114
>
>
> -- 
> Geir Magnusson Jr                                  +1-203-665-6437
> geir@optonline.net
>
>

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org

[legal] Take II (Re: [legal] Proposed changes for the Bulk Contributor Questionnaire)

Posted by "Geir Magnusson Jr." <ge...@apache.org>.

Ok - we gotten some good feedback on this.   I'd like to emphasize  
that these changes really are for what I hope are corner cases,  
because any contribution for which the contributor has all the ACQs  
(like the 3 we have already), these new questions aren't asked.

The following is the same proposal with the keyword scan question  
removed (I'll move that to  the contribution guidelines page), the  
suggestion pointing out BlackDuck by name also changed (maybe we  
offer a list of entities that could help potential contributors on  
our website?).

Also, I saw Zoe's note offering her keyword scanner software  
(although I missed the note because of our mail problems), and I'm  
all for it.  Zoe, if you can just put it in a JIRA, that would be  
great.  If there's anyone interested in working on this kind of  
software for our use (and the ASF use), step up and we can make a  
little subproject out of it...

-----


         The Apache Software Foundation
            Apache Harmony Project
         Bulk Contribution Checklist
                v 1.1 20051114

The Apache Harmony project is dedicated to producing a codebase that
has clear IP pedigree and protects the IP rights of others.  As part
of this effort, we ask the following questions of all contributions
of software that has been created outside of the project.  Our goal is
to provide clear and consistent oversight of the project codebase, as
well as encourage our contributors to carefully examine their
contributions before bringing to the project.

Please Note : This document and your answers are considered public
information, and shall be part of the Apache Harmony project public
records.


Part I :  Identification

    Please provide the following information

       Name : ___________________________________________
     E-mail : ___________________________________________

     Mailing address :
         ___________________________________________
         ___________________________________________
         ___________________________________________
         ___________________________________________

      Employer :  ___________________________________________


Part II : Description

     Please describe the contribution :





Part III :  Statement of Origination

a) Have you personally written all of the code or other material
    that you are intending to contribute to this project, and if so,
    are you an Authorized Contributor for all parts of the contribution?

   [ ] Yes
   [ ] No

   If "yes", you're done with Part III, skip to Part IV
   If "no" please continue with the rest of Part III

b) Have you verified the development history of the code to
    identify ALL of the authors?

    Please list the other authors:


c) Do you have a written agreement with all of the authors that
    either gives you ownership of the material or otherwise provides
    you sufficient rights to submit this material to the project
    on their behalf.

    Please provide the details of this agreement:


d) Are all of the authors Authorized Contributors for the part of
    the contribution written/created by each author?

   [ ] Yes – if "yes", you're done with Part III, skip to Part IV.
   [ ] No – if "no", please continue with the rest of part III


e) Was the code written prior to May 2005 (when the Harmony Project
    was initiated)?

   [ ] Yes
   [ ] No

   (i)  If No, you must provide Authorized Contributor Questionnaires
        for the authors of the code created after May 2005 such that
        those authors  are classified as Authorized Contributors for
        the portions of the contribution  written by them
        after May 2005.

f) Did any of the authors of the code have access to third
    party implementations of similar technology while developing the
    contribution?

   [ ] Yes
   [ ] No

   If "yes", please give details below :




g) Was the code developed in accordance with a  development
    process which was designed to prevent unauthorized inclusion
    of third party  intellectual property rights into the code?
    (e.g., does the process require that developers not have
    concurrent access to third party implementations of similar
    technology during development?)

   [ ] Yes
   [ ] No

   If "no", the code isn't eligible for the Harmony Project.

   If "yes", please provide short description of the process,
   focusing on protections related to third party intellectual
   property :




Note : The Apache Harmony project generally performs additional
scans of it's codebase, including bulk contributions, to help
confirm code pedigree.  Prior to submitting any contribution,
we strongly encourage you to verify that the contribution is
acceptable.  Please see http://<URL> for more information.




Part IV : Checklist

   [ ] Contribution is licensed under the Apache License v2.0

   [ ] Software Grant or Corporate Contributor License Agreement and  
Software
       Grant executed and submitted


  Signature : ___________________________________________
Print Name : ___________________________________________
       Date : ___________________________________________



v1.0  20051114


-- 
Geir Magnusson Jr                                  +1-203-665-6437
geir@optonline.net