You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Beckerle, Mike" <mb...@owlcyberdefense.com> on 2021/09/09 15:16:41 UTC

daffodil-vscode - how to package and identify the contribution - some git questions

So the daffodil-vscode code-base wants to be granted to become part of the Daffodil project.

One question arises which is "what is the contribution?" exactly.

The normal way this is identified is by creating a tarball of the source files and specifying an sha or md5 hash of that file.

However, this code base is perhaps different from usual.

It started by creating a detached fork of the vscode debugger example code base. This is MIT-Licensed which is a compatible license.

The files are then edited. There are around 100 commits on top of the base that came from the vscode debugger repository.

So the contribution is that set of 100 commits - the patches/change-sets they represent.

These commits often edit the original files of the vscode debugger example to add the daffodil-specific functionality. That is, the contribution material is in several cases intermingled in the lines of the existing files. That's ok I think so long as the modified file had MIT license.

There's some value in preserving the 100 commits by our contributors, not squashing it down to one commit, though if it's really not sensible to proceed otherwise, we can choose to squash it down to one commit.

Furthermore, the vscode debugger example repo itself had many commits in it. The current daffodil-vscode repo preserves all these commits as well. I don't see value in preserving these commits, and would rather they were squashed into a single "starting point" commit, with a dependencies file specifying the githash where we forked from, just so we can refer back if necessary.

So as a starting suggestion (subject to discussion of other alternatives) is this:

Plan A:

1. squash all commits up to and including the last Microsoft commit, together into one.
2. rebase the remaining commits on top of that.
* I'm a bit worried about this rebase. There are merge commits, etc. in the history. I'm not sure this will just all rebase while preserving all the commits, but maybe it will "just work"
3. create a "patch set" corresponding to the 100 or so commits that make up the "contribution".
* I don't know if this is even feasible for this many commits.
4. create a tar/zip of this aggregate patch set.
5. compute an md5 of this patch set.

The patch set tar/zip file and its md5 hash are "the granted software".

The problem with this idea is that there's no obvious way to review a patch set, shy of applying it.

A better way may be to change steps 3 - 5 above to

Plan B:

3. push the main branch to a new empty git repository
The point of this is to remove all historic stuff from the repository, i.e., have a minimal git repo that contains only the contribution and the single other commit it must be based on.

4. create a tarball of this git repository, and md5 hash of it

5. document that the contribution is from githash X (after the first commit) to githash Y (the final commit) of this repository

This has the advantage that the contribution is a self-contained review-able thing.

Other ideas are welcome. (Plans C, D, etc) The only requirements I know of are:

1. a single file containing the contribution, and its md5 hash
2. a sensible way one can review the contents of this contribution file
3. preserve history of derivation from the vscode debugger example.

Mike Beckerle | Principal Engineer

[cid:238f633f-3220-4dc5-944c-ca72b28b8338]

mbeckerle@owlcyberdefense.com<ma...@owlcyberdefense.com>

P +1-781-330-0412

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

For comparison, it looks like Apache OpenWhisk has a vscode extension:

  https://github.com/apache/openwhisk-vscode-extension

And here's the IP Clearnace page for that:

  https://incubator.apache.org/ip-clearance/openwhisk-vscode-extension.html

It looks like they simply imported the 1.0 release without any git
history. And it looks like it was already licensed as ALv2.

I think it would be worth the effort to remove as much Microsoft MIT
licensed code as possible, and then license whatever is left over as
ALv2 (assuming the authors have the rights to do so). This will make it
more clear what is actually MIT vs ALv2 and will likely ease the IP
clearance process.


On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> So the daffodil-vscode code-base wants to be granted to become part of the 
> Daffodil project.
> 
> One question arises which is "what is the contribution?" exactly.
> 
> The normal way this is identified is by creating a tarball of the source files 
> and specifying an sha or md5 hash of that file.
> 
> However, this code base is perhaps different from usual.
> 
> It started by creating a detached fork of the vscode debugger example code base. 
> This is MIT-Licensed which is a compatible license.
> 
> The files are then edited. There are around 100 commits on top of the base that 
> came from the vscode debugger repository.
> 
> So the contribution is that set of 100 commits - the patches/change-sets they 
> represent.
> 
> These commits often edit the original files of the vscode debugger example to 
> add the daffodil-specific functionality. That is, the contribution material is 
> in several cases intermingled in the lines of the existing files.  That's ok I 
> think so long as the modified file had MIT license.
> 
> There's some value in preserving the 100 commits by our contributors, not 
> squashing it down to one commit, though if it's really not sensible to proceed 
> otherwise, we can choose to squash it down to one commit.
> 
> Furthermore, the vscode debugger example repo itself had many commits in it. The 
> current daffodil-vscode repo preserves all these commits as well. I don't see 
> value in preserving these commits, and would rather they were squashed into a 
> single "starting point" commit, with a dependencies file specifying the githash 
> where we forked from, just so we can refer back if necessary.
> 
> So as a starting suggestion (subject to discussion of other alternatives) is this:
> 
> Plan A:
> 
>  1. squash all commits up to and including the last Microsoft commit, together
>     into one.
>  2. rebase the remaining commits on top of that.
>      1. I'm a bit worried about this rebase. There are merge commits, etc. in
>         the history. I'm not sure this will just all rebase while preserving all
>         the commits, but maybe it will "just work"
>  3. create a "patch set" corresponding to the 100 or so commits that make up the
>     "contribution".
>      1. I don't know if this is even feasible for this many commits. 
>  4. create a tar/zip of this aggregate patch set.
>  5. compute an md5 of this patch set.
> 
> The patch set tar/zip file and its md5 hash are "the granted software".
> 
> The problem with this idea is that there's no obvious way to review a patch set, 
> shy of applying it.
> 
> A better way may be to change steps 3 - 5 above to
> 
> Plan B:
> 
>     3. push the main branch to a new empty git repository
>          The point of this is to remove all historic stuff from the repository,
>     i.e., have a minimal git repo that contains only the contribution and the
>     single other commit it must be based on.
> 
>     4. create a tarball of this git repository, and md5 hash of it
> 
>     5. document that the contribution is from githash X (after the first commit)
>     to githash Y (the final commit) of this repository
> 
> 
> This has the advantage that the contribution is a self-contained review-able thing.
> 
> Other ideas are welcome. (Plans C, D, etc) The only requirements I know of are:
> 
>  1. a single file containing the contribution, and its md5 hash
>  2. a sensible way one can review the contents of this contribution file
>  3. preserve history of derivation from the vscode debugger example.
> 
> 
> 
> 
> 
> 
> 
> 
> Mike Beckerle | Principal Engineer
> 
> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> 
> P +1-781-330-0412
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

I think I have a good example for you to use.

The plc4x apache project played around with daffodil and created a DFDL schema for an industrial control protocol called s7.

So it's a not too difficult binary data format. But it is a real format.

Everything is already apache licensed, and at Owl we test it against every Daffodil release, so we know it continues to work.

tarball attached.


________________________________
From: Beckerle, Mike <mb...@owlcyberdefense.com>
Sent: Thursday, September 16, 2021 2:56 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

Suggest you just excise this file, and any tests that depend on it for now.

We can wire in a new example workspace and add in tests subsequently before the first "release" of this.





________________________________
From: John Wass <jw...@gmail.com>
Sent: Thursday, September 16, 2021 10:48 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

> I know of one file in the repo which will have to be removed which is the
jpeg.dfdl.xsd file, which is there just as an example workspace.

I assume this issue remains, and needs to be addressed prior to giving this
the done stamp.

We could just remove that sample workspace, the setup is trivial and is
addressed in the docs, but that schema and jpg also exist for unit tests.

Looking through the test resources in Daffodil now, any suggestions on a
good candidate are welcomed.



On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> I know of one file in the repo which will have to be removed which is the
> jpeg.dfdl.xsd file, which is there just as an example workspace.
>
> The copyright and provisions of that are not compatible with Apache
> licensing.
>
> We can find a DFDL schema that we created that has Apache license to use
> instead.
>
> For the other files under src, server, and build, can we generate a list
> of files identifying which are:
>
> (a) original MIT-licensed, unmodified
> (b) new - can be ASL
> (c) blended - started from MIT-licensed source, modified with
> daffodil-vscode-specific changes.
>
> It is these blended files that are the problematic ones.
>
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, September 9, 2021 1:38 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Correct. For more information about Apache license compatibility:
>
>   https://www.apache.org/legal/resolved.html
>
> MIT is Category A and is fine. EPL is Category B and is also okay, but
> generally only in its binary form. So these top-level dependencies look
> okay, assuming their transitive dependencies are also okay.
>
> We'll also need to verify the licenses of all code in the repo.
> Hopefully little of that is original microsoft MIT and can be granted to
> ASF and relicensed.
>
>
> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> > The requirement, is that the entire dependency tree (transitively)
> cannot depend on any software that has an Apache-incompatible (aka
> restrictive) license.
> >
> > So we need the transitive closure of all dependencies.
> >
> >
> > ________________________________
> > From: Adam Rosien <ad...@rosien.net>
> > Sent: Thursday, September 9, 2021 12:44 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
> >
> > (I don't understand the requirements of licencing + transitive
> > dependencies, so I'm giving some surface level license info)
> >
> > "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> > http://logback.qos.ch/license.html
> > "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> 1.0
> > "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> > "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> > "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >
> > On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >
> >> I can relay the list of dependencies and their licenses.
> >>
> >> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> >> wrote:
> >>
> >>> I personally don't care too much about having the existing git history
> >>> once its part of ASF, especially if it makes thing any easier (as you
> >>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>> just do plan B--create a tarball of the current state (without the git
> >>> history), and the content of that tarball is what goes through the IP
> >>> clearance process, and is the content of the inital commit when adding
> >>> to the apache/daffodil-vscode repo.
> >>>
> >>> Note that I think the incubator will still want access to the existing
> >>> repo so they can view the full git history. Understanding where
> >>> everything came from and verifying the provenance is important to
> >>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>> maybe what is officially voted on, they will want access to the repo.
> >>>
> >>> That said, I don't think we are going to get CLA's for any Microsoft
> >>> contribute code. So either all Microsoft contributed code will need to
> >>> be kept MIT, or removed from the codebase. And if feels a bit odd to
> >>> grant something to ASF where the original codebase stays MIT and isn't
> >>> part of that grant.
> >>>
> >>> I think understanding how much code still exists that is Microsoft/MIT
> >>> is going to be important to getting this through the IP clearance
> process.
> >>>
> >>> So I'm curious how much of that original Microsoft code still exists? I
> >>> assume since it was just example code it has mostly been replaced? If
> >>> that's the case, we could potentially say Microsoft has no ownership of
> >>> this code, and so their CLA and MIT license aren't necessary?
> >>>
> >>> We should also have a good understanding of the dependencies. If any of
> >>> them are not compatible with ALv2, then going through this process
> isn't
> >>> even worth it until they are replaced. Do you have a list of the
> >>> dependencies?
> >>>
> >>>
> >>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>> So the daffodil-vscode code-base wants to be granted to become part of
> >>> the
> >>>> Daffodil project.
> >>>>
> >>>> One question arises which is "what is the contribution?" exactly.
> >>>>
> >>>> The normal way this is identified is by creating a tarball of the
> >>> source files
> >>>> and specifying an sha or md5 hash of that file.
> >>>>
> >>>> However, this code base is perhaps different from usual.
> >>>>
> >>>> It started by creating a detached fork of the vscode debugger example
> >>> code base.
> >>>> This is MIT-Licensed which is a compatible license.
> >>>>
> >>>> The files are then edited. There are around 100 commits on top of the
> >>> base that
> >>>> came from the vscode debugger repository.
> >>>>
> >>>> So the contribution is that set of 100 commits - the
> >>> patches/change-sets they
> >>>> represent.
> >>>>
> >>>> These commits often edit the original files of the vscode debugger
> >>> example to
> >>>> add the daffodil-specific functionality. That is, the contribution
> >>> material is
> >>>> in several cases intermingled in the lines of the existing files.
> >>> That's ok I
> >>>> think so long as the modified file had MIT license.
> >>>>
> >>>> There's some value in preserving the 100 commits by our contributors,
> >>> not
> >>>> squashing it down to one commit, though if it's really not sensible to
> >>> proceed
> >>>> otherwise, we can choose to squash it down to one commit.
> >>>>
> >>>> Furthermore, the vscode debugger example repo itself had many commits
> >>> in it. The
> >>>> current daffodil-vscode repo preserves all these commits as well. I
> >>> don't see
> >>>> value in preserving these commits, and would rather they were squashed
> >>> into a
> >>>> single "starting point" commit, with a dependencies file specifying
> the
> >>> githash
> >>>> where we forked from, just so we can refer back if necessary.
> >>>>
> >>>> So as a starting suggestion (subject to discussion of other
> >>> alternatives) is this:
> >>>>
> >>>> Plan A:
> >>>>
> >>>>  1. squash all commits up to and including the last Microsoft commit,
> >>> together
> >>>>     into one.
> >>>>  2. rebase the remaining commits on top of that.
> >>>>      1. I'm a bit worried about this rebase. There are merge commits,
> >>> etc. in
> >>>>         the history. I'm not sure this will just all rebase while
> >>> preserving all
> >>>>         the commits, but maybe it will "just work"
> >>>>  3. create a "patch set" corresponding to the 100 or so commits that
> >>> make up the
> >>>>     "contribution".
> >>>>      1. I don't know if this is even feasible for this many commits.
> >>>>  4. create a tar/zip of this aggregate patch set.
> >>>>  5. compute an md5 of this patch set.
> >>>>
> >>>> The patch set tar/zip file and its md5 hash are "the granted
> software".
> >>>>
> >>>> The problem with this idea is that there's no obvious way to review a
> >>> patch set,
> >>>> shy of applying it.
> >>>>
> >>>> A better way may be to change steps 3 - 5 above to
> >>>>
> >>>> Plan B:
> >>>>
> >>>>     3. push the main branch to a new empty git repository
> >>>>          The point of this is to remove all historic stuff from the
> >>> repository,
> >>>>     i.e., have a minimal git repo that contains only the contribution
> >>> and the
> >>>>     single other commit it must be based on.
> >>>>
> >>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>
> >>>>     5. document that the contribution is from githash X (after the
> >>> first commit)
> >>>>     to githash Y (the final commit) of this repository
> >>>>
> >>>>
> >>>> This has the advantage that the contribution is a self-contained
> >>> review-able thing.
> >>>>
> >>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> know
> >>> of are:
> >>>>
> >>>>  1. a single file containing the contribution, and its md5 hash
> >>>>  2. a sensible way one can review the contents of this contribution
> file
> >>>>  3. preserve history of derivation from the vscode debugger example.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Mike Beckerle | Principal Engineer
> >>>>
> >>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >>>>
> >>>> P +1-781-330-0412
> >>>>
> >>>
> >>>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

Suggest you just excise this file, and any tests that depend on it for now.

We can wire in a new example workspace and add in tests subsequently before the first "release" of this.





________________________________
From: John Wass <jw...@gmail.com>
Sent: Thursday, September 16, 2021 10:48 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

> I know of one file in the repo which will have to be removed which is the
jpeg.dfdl.xsd file, which is there just as an example workspace.

I assume this issue remains, and needs to be addressed prior to giving this
the done stamp.

We could just remove that sample workspace, the setup is trivial and is
addressed in the docs, but that schema and jpg also exist for unit tests.

Looking through the test resources in Daffodil now, any suggestions on a
good candidate are welcomed.



On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> I know of one file in the repo which will have to be removed which is the
> jpeg.dfdl.xsd file, which is there just as an example workspace.
>
> The copyright and provisions of that are not compatible with Apache
> licensing.
>
> We can find a DFDL schema that we created that has Apache license to use
> instead.
>
> For the other files under src, server, and build, can we generate a list
> of files identifying which are:
>
> (a) original MIT-licensed, unmodified
> (b) new - can be ASL
> (c) blended - started from MIT-licensed source, modified with
> daffodil-vscode-specific changes.
>
> It is these blended files that are the problematic ones.
>
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, September 9, 2021 1:38 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Correct. For more information about Apache license compatibility:
>
>   https://www.apache.org/legal/resolved.html
>
> MIT is Category A and is fine. EPL is Category B and is also okay, but
> generally only in its binary form. So these top-level dependencies look
> okay, assuming their transitive dependencies are also okay.
>
> We'll also need to verify the licenses of all code in the repo.
> Hopefully little of that is original microsoft MIT and can be granted to
> ASF and relicensed.
>
>
> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> > The requirement, is that the entire dependency tree (transitively)
> cannot depend on any software that has an Apache-incompatible (aka
> restrictive) license.
> >
> > So we need the transitive closure of all dependencies.
> >
> >
> > ________________________________
> > From: Adam Rosien <ad...@rosien.net>
> > Sent: Thursday, September 9, 2021 12:44 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
> >
> > (I don't understand the requirements of licencing + transitive
> > dependencies, so I'm giving some surface level license info)
> >
> > "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> > http://logback.qos.ch/license.html
> > "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> 1.0
> > "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> > "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> > "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >
> > On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >
> >> I can relay the list of dependencies and their licenses.
> >>
> >> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> >> wrote:
> >>
> >>> I personally don't care too much about having the existing git history
> >>> once its part of ASF, especially if it makes thing any easier (as you
> >>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>> just do plan B--create a tarball of the current state (without the git
> >>> history), and the content of that tarball is what goes through the IP
> >>> clearance process, and is the content of the inital commit when adding
> >>> to the apache/daffodil-vscode repo.
> >>>
> >>> Note that I think the incubator will still want access to the existing
> >>> repo so they can view the full git history. Understanding where
> >>> everything came from and verifying the provenance is important to
> >>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>> maybe what is officially voted on, they will want access to the repo.
> >>>
> >>> That said, I don't think we are going to get CLA's for any Microsoft
> >>> contribute code. So either all Microsoft contributed code will need to
> >>> be kept MIT, or removed from the codebase. And if feels a bit odd to
> >>> grant something to ASF where the original codebase stays MIT and isn't
> >>> part of that grant.
> >>>
> >>> I think understanding how much code still exists that is Microsoft/MIT
> >>> is going to be important to getting this through the IP clearance
> process.
> >>>
> >>> So I'm curious how much of that original Microsoft code still exists? I
> >>> assume since it was just example code it has mostly been replaced? If
> >>> that's the case, we could potentially say Microsoft has no ownership of
> >>> this code, and so their CLA and MIT license aren't necessary?
> >>>
> >>> We should also have a good understanding of the dependencies. If any of
> >>> them are not compatible with ALv2, then going through this process
> isn't
> >>> even worth it until they are replaced. Do you have a list of the
> >>> dependencies?
> >>>
> >>>
> >>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>> So the daffodil-vscode code-base wants to be granted to become part of
> >>> the
> >>>> Daffodil project.
> >>>>
> >>>> One question arises which is "what is the contribution?" exactly.
> >>>>
> >>>> The normal way this is identified is by creating a tarball of the
> >>> source files
> >>>> and specifying an sha or md5 hash of that file.
> >>>>
> >>>> However, this code base is perhaps different from usual.
> >>>>
> >>>> It started by creating a detached fork of the vscode debugger example
> >>> code base.
> >>>> This is MIT-Licensed which is a compatible license.
> >>>>
> >>>> The files are then edited. There are around 100 commits on top of the
> >>> base that
> >>>> came from the vscode debugger repository.
> >>>>
> >>>> So the contribution is that set of 100 commits - the
> >>> patches/change-sets they
> >>>> represent.
> >>>>
> >>>> These commits often edit the original files of the vscode debugger
> >>> example to
> >>>> add the daffodil-specific functionality. That is, the contribution
> >>> material is
> >>>> in several cases intermingled in the lines of the existing files.
> >>> That's ok I
> >>>> think so long as the modified file had MIT license.
> >>>>
> >>>> There's some value in preserving the 100 commits by our contributors,
> >>> not
> >>>> squashing it down to one commit, though if it's really not sensible to
> >>> proceed
> >>>> otherwise, we can choose to squash it down to one commit.
> >>>>
> >>>> Furthermore, the vscode debugger example repo itself had many commits
> >>> in it. The
> >>>> current daffodil-vscode repo preserves all these commits as well. I
> >>> don't see
> >>>> value in preserving these commits, and would rather they were squashed
> >>> into a
> >>>> single "starting point" commit, with a dependencies file specifying
> the
> >>> githash
> >>>> where we forked from, just so we can refer back if necessary.
> >>>>
> >>>> So as a starting suggestion (subject to discussion of other
> >>> alternatives) is this:
> >>>>
> >>>> Plan A:
> >>>>
> >>>>  1. squash all commits up to and including the last Microsoft commit,
> >>> together
> >>>>     into one.
> >>>>  2. rebase the remaining commits on top of that.
> >>>>      1. I'm a bit worried about this rebase. There are merge commits,
> >>> etc. in
> >>>>         the history. I'm not sure this will just all rebase while
> >>> preserving all
> >>>>         the commits, but maybe it will "just work"
> >>>>  3. create a "patch set" corresponding to the 100 or so commits that
> >>> make up the
> >>>>     "contribution".
> >>>>      1. I don't know if this is even feasible for this many commits.
> >>>>  4. create a tar/zip of this aggregate patch set.
> >>>>  5. compute an md5 of this patch set.
> >>>>
> >>>> The patch set tar/zip file and its md5 hash are "the granted
> software".
> >>>>
> >>>> The problem with this idea is that there's no obvious way to review a
> >>> patch set,
> >>>> shy of applying it.
> >>>>
> >>>> A better way may be to change steps 3 - 5 above to
> >>>>
> >>>> Plan B:
> >>>>
> >>>>     3. push the main branch to a new empty git repository
> >>>>          The point of this is to remove all historic stuff from the
> >>> repository,
> >>>>     i.e., have a minimal git repo that contains only the contribution
> >>> and the
> >>>>     single other commit it must be based on.
> >>>>
> >>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>
> >>>>     5. document that the contribution is from githash X (after the
> >>> first commit)
> >>>>     to githash Y (the final commit) of this repository
> >>>>
> >>>>
> >>>> This has the advantage that the contribution is a self-contained
> >>> review-able thing.
> >>>>
> >>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> know
> >>> of are:
> >>>>
> >>>>  1. a single file containing the contribution, and its md5 hash
> >>>>  2. a sensible way one can review the contents of this contribution
> file
> >>>>  3. preserve history of derivation from the vscode debugger example.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Mike Beckerle | Principal Engineer
> >>>>
> >>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >>>>
> >>>> P +1-781-330-0412
> >>>>
> >>>
> >>>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by John Wass <jw...@gmail.com>.

> I know of one file in the repo which will have to be removed which is the
jpeg.dfdl.xsd file, which is there just as an example workspace.

I assume this issue remains, and needs to be addressed prior to giving this
the done stamp.

We could just remove that sample workspace, the setup is trivial and is
addressed in the docs, but that schema and jpg also exist for unit tests.

Looking through the test resources in Daffodil now, any suggestions on a
good candidate are welcomed.



On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> I know of one file in the repo which will have to be removed which is the
> jpeg.dfdl.xsd file, which is there just as an example workspace.
>
> The copyright and provisions of that are not compatible with Apache
> licensing.
>
> We can find a DFDL schema that we created that has Apache license to use
> instead.
>
> For the other files under src, server, and build, can we generate a list
> of files identifying which are:
>
> (a) original MIT-licensed, unmodified
> (b) new - can be ASL
> (c) blended - started from MIT-licensed source, modified with
> daffodil-vscode-specific changes.
>
> It is these blended files that are the problematic ones.
>
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, September 9, 2021 1:38 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Correct. For more information about Apache license compatibility:
>
>   https://www.apache.org/legal/resolved.html
>
> MIT is Category A and is fine. EPL is Category B and is also okay, but
> generally only in its binary form. So these top-level dependencies look
> okay, assuming their transitive dependencies are also okay.
>
> We'll also need to verify the licenses of all code in the repo.
> Hopefully little of that is original microsoft MIT and can be granted to
> ASF and relicensed.
>
>
> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> > The requirement, is that the entire dependency tree (transitively)
> cannot depend on any software that has an Apache-incompatible (aka
> restrictive) license.
> >
> > So we need the transitive closure of all dependencies.
> >
> >
> > ________________________________
> > From: Adam Rosien <ad...@rosien.net>
> > Sent: Thursday, September 9, 2021 12:44 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
> >
> > (I don't understand the requirements of licencing + transitive
> > dependencies, so I'm giving some surface level license info)
> >
> > "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> > http://logback.qos.ch/license.html
> > "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> 1.0
> > "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> > "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> > "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >
> > On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >
> >> I can relay the list of dependencies and their licenses.
> >>
> >> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> >> wrote:
> >>
> >>> I personally don't care too much about having the existing git history
> >>> once its part of ASF, especially if it makes thing any easier (as you
> >>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>> just do plan B--create a tarball of the current state (without the git
> >>> history), and the content of that tarball is what goes through the IP
> >>> clearance process, and is the content of the inital commit when adding
> >>> to the apache/daffodil-vscode repo.
> >>>
> >>> Note that I think the incubator will still want access to the existing
> >>> repo so they can view the full git history. Understanding where
> >>> everything came from and verifying the provenance is important to
> >>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>> maybe what is officially voted on, they will want access to the repo.
> >>>
> >>> That said, I don't think we are going to get CLA's for any Microsoft
> >>> contribute code. So either all Microsoft contributed code will need to
> >>> be kept MIT, or removed from the codebase. And if feels a bit odd to
> >>> grant something to ASF where the original codebase stays MIT and isn't
> >>> part of that grant.
> >>>
> >>> I think understanding how much code still exists that is Microsoft/MIT
> >>> is going to be important to getting this through the IP clearance
> process.
> >>>
> >>> So I'm curious how much of that original Microsoft code still exists? I
> >>> assume since it was just example code it has mostly been replaced? If
> >>> that's the case, we could potentially say Microsoft has no ownership of
> >>> this code, and so their CLA and MIT license aren't necessary?
> >>>
> >>> We should also have a good understanding of the dependencies. If any of
> >>> them are not compatible with ALv2, then going through this process
> isn't
> >>> even worth it until they are replaced. Do you have a list of the
> >>> dependencies?
> >>>
> >>>
> >>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>> So the daffodil-vscode code-base wants to be granted to become part of
> >>> the
> >>>> Daffodil project.
> >>>>
> >>>> One question arises which is "what is the contribution?" exactly.
> >>>>
> >>>> The normal way this is identified is by creating a tarball of the
> >>> source files
> >>>> and specifying an sha or md5 hash of that file.
> >>>>
> >>>> However, this code base is perhaps different from usual.
> >>>>
> >>>> It started by creating a detached fork of the vscode debugger example
> >>> code base.
> >>>> This is MIT-Licensed which is a compatible license.
> >>>>
> >>>> The files are then edited. There are around 100 commits on top of the
> >>> base that
> >>>> came from the vscode debugger repository.
> >>>>
> >>>> So the contribution is that set of 100 commits - the
> >>> patches/change-sets they
> >>>> represent.
> >>>>
> >>>> These commits often edit the original files of the vscode debugger
> >>> example to
> >>>> add the daffodil-specific functionality. That is, the contribution
> >>> material is
> >>>> in several cases intermingled in the lines of the existing files.
> >>> That's ok I
> >>>> think so long as the modified file had MIT license.
> >>>>
> >>>> There's some value in preserving the 100 commits by our contributors,
> >>> not
> >>>> squashing it down to one commit, though if it's really not sensible to
> >>> proceed
> >>>> otherwise, we can choose to squash it down to one commit.
> >>>>
> >>>> Furthermore, the vscode debugger example repo itself had many commits
> >>> in it. The
> >>>> current daffodil-vscode repo preserves all these commits as well. I
> >>> don't see
> >>>> value in preserving these commits, and would rather they were squashed
> >>> into a
> >>>> single "starting point" commit, with a dependencies file specifying
> the
> >>> githash
> >>>> where we forked from, just so we can refer back if necessary.
> >>>>
> >>>> So as a starting suggestion (subject to discussion of other
> >>> alternatives) is this:
> >>>>
> >>>> Plan A:
> >>>>
> >>>>  1. squash all commits up to and including the last Microsoft commit,
> >>> together
> >>>>     into one.
> >>>>  2. rebase the remaining commits on top of that.
> >>>>      1. I'm a bit worried about this rebase. There are merge commits,
> >>> etc. in
> >>>>         the history. I'm not sure this will just all rebase while
> >>> preserving all
> >>>>         the commits, but maybe it will "just work"
> >>>>  3. create a "patch set" corresponding to the 100 or so commits that
> >>> make up the
> >>>>     "contribution".
> >>>>      1. I don't know if this is even feasible for this many commits.
> >>>>  4. create a tar/zip of this aggregate patch set.
> >>>>  5. compute an md5 of this patch set.
> >>>>
> >>>> The patch set tar/zip file and its md5 hash are "the granted
> software".
> >>>>
> >>>> The problem with this idea is that there's no obvious way to review a
> >>> patch set,
> >>>> shy of applying it.
> >>>>
> >>>> A better way may be to change steps 3 - 5 above to
> >>>>
> >>>> Plan B:
> >>>>
> >>>>     3. push the main branch to a new empty git repository
> >>>>          The point of this is to remove all historic stuff from the
> >>> repository,
> >>>>     i.e., have a minimal git repo that contains only the contribution
> >>> and the
> >>>>     single other commit it must be based on.
> >>>>
> >>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>
> >>>>     5. document that the contribution is from githash X (after the
> >>> first commit)
> >>>>     to githash Y (the final commit) of this repository
> >>>>
> >>>>
> >>>> This has the advantage that the contribution is a self-contained
> >>> review-able thing.
> >>>>
> >>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> know
> >>> of are:
> >>>>
> >>>>  1. a single file containing the contribution, and its md5 hash
> >>>>  2. a sensible way one can review the contents of this contribution
> file
> >>>>  3. preserve history of derivation from the vscode debugger example.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Mike Beckerle | Principal Engineer
> >>>>
> >>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >>>>
> >>>> P +1-781-330-0412
> >>>>
> >>>
> >>>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

I don't think I would try squashing anything. The full and accurate git
history will likely be wanted by the IP clearance process. A tarball
will be the result of that process, and we can add it to the get repo in
one commit. Trying to maintain git history just complicates things for I
think not much benefit, especially if things like bug/PR ID's in commit
messages are incorrect.

It's more important that files in the repo have correct license headers
add to make it clear the who the copyright holder is of all the files.


On 9/9/21 9:44 PM, John Wass wrote:
> I had a few more (6) source files as modified..
> 
> extension.ts
> debugAdapter.ts
> daffodilRuntime.ts
> daffodilDebug.ts
> adapter.test.ts
> activateDaffodilDebug.ts
> 
>> It would seem an IDE (probably vscode!) decided to restyle/reindent this
> code.
> 
> We added opinionated code formatting... apparently trying to make this
> process as hard as possible :/
> 
> That reformat commit was done on 08/25/2021, title of PR was Prettier.
> Looking prior to that commit might give a little better idea of what
> changed.
> 
> 
>> squash/rebase can be difficut through merges
> 
> Here is a quick pass on (1) squashing the MS source in a single commit (2)
> placing that commit on top of an init commit in a repo (2) and then
> rewriting out commits on top of all of that.
> 
> It preserves our authorship.  Can be cleaned up a little bit still but I am
> not going to put time into it if we don't want this.  I just wanted to note
> how it could look.
> 
> https://github.com/jw3/rewrite-daffodil-vscode-1
> 
> One issue I could see here is the linking of the example repo PR IDs in the
> commit messages will conflict once we start adding PRs in the new repo.
> Now would be the time to rewrite these commit messages and strip/modify
> those #ID tags.
> 
> Thoughts on that rewrite repo?
> 
> 
> 
> 
> 
> On Thu, Sep 9, 2021 at 5:42 PM Beckerle, Mike <mb...@owlcyberdefense.com>
> wrote:
> 
>> So via some git trickery I was able to determine the "blended" files.
>>
>> I'm ignoring the various configuration files which are generally json
>> files.
>>
>> Of the ".ts" files only 3 are blended:
>>
>> src/debugAdapter.ts - 72 lines - only maybe 6 lines are different
>> src/extension.ts - 179 lines
>> src/tests/adapter.test.ts - 137 lines (50 of which are commented-out code)
>>
>> The delta between these files and the original files of the same name are
>> larger than expected due to changes in whitespace, and removal of ";" at
>> end of line (which I guess are optional in many places in typescript).
>>
>> It would seem an IDE (probably vscode!) decided to restyle/reindent this
>> code.
>>
>> So it's a bit hard to figure out what the "real" deltas are.
>>
>> src/debugAdapter.ts appears to be only trivially different. The name
>> MockDebugSession was replaced by DaffodilDebugSession, and "./mockDebug"
>> was changed to "./daffodilDebug".
>>
>> The other two files do appear to be where all the real blended code is.
>>
>>
>>
>> ________________________________
>> From: Beckerle, Mike <mb...@owlcyberdefense.com>
>> Sent: Thursday, September 9, 2021 4:21 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> Whether it's a PR or series of PRs, or a software grant, that still
>> doesn't resolve the issue of the blended files which are part MIT-licensed
>> original code, and part new code deltas by the daffodil-vscode contributors.
>>
>> We need to understand whether those blended files can be teased apart
>> somehow so that it is clear going forward what is an MIT-licensed library
>> and what is Apache Licensed.
>>
>> I just did a grep -R -i microsoft  in a clone of the
>> openwhisk-vscode-extension and got zero hits. So no files still carry
>> microsoft copyright and in fact their NOTICES.txt file does not indicate
>> any dependency on MIT-licensed code at all.  So I think
>> openwhisk-vscode-extension is not going to help us figure out how to surf
>> this issue.
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Thursday, September 9, 2021 3:54 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> The concern is that this code was developed outside of Apache and so
>> didn't follow standard Apache process. From the IP clearance page:
>>
>> https://incubator.apache.org/ip-clearance/
>>
>>> Any code that was developed outside of the ASF SVN repository and
>>> our public mailing lists must be processed like this, even if the
>>> external developer is already an ASF committer.
>>
>> I suppose that submitting it as a PR does follow some of that process,
>> but there is maybe less assurance of ownership. Because it was not
>> developed in an ASF repository, that code is presumed to be owned by
>> you, multiple developers, or a company, and so that ownership must be
>> granted to ASF via the IP clearance process, with appropriate software
>> grant, CLA's, etc. (At least, that's my admittedly limited understanding
>> of the process).
>>
>> - Steve
>>
>>
>> On 9/9/21 3:34 PM, John Wass wrote:
>>> Couldn't we (the vscode contributors) submit a series of PRs against the
>>> new repo to move the code, and just archive the example repo as-is?
>>>
>>> I noted some thoughts on that a while back
>>> https://github.com/jw3/example-daffodil-vscode/issues/77
>>>
>>>
>>>
>>> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
>> mbeckerle@owlcyberdefense.com>
>>> wrote:
>>>
>>>> I know of one file in the repo which will have to be removed which is
>> the
>>>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>>>
>>>> The copyright and provisions of that are not compatible with Apache
>>>> licensing.
>>>>
>>>> We can find a DFDL schema that we created that has Apache license to use
>>>> instead.
>>>>
>>>> For the other files under src, server, and build, can we generate a list
>>>> of files identifying which are:
>>>>
>>>> (a) original MIT-licensed, unmodified
>>>> (b) new - can be ASL
>>>> (c) blended - started from MIT-licensed source, modified with
>>>> daffodil-vscode-specific changes.
>>>>
>>>> It is these blended files that are the problematic ones.
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Steve Lawrence <sl...@apache.org>
>>>> Sent: Thursday, September 9, 2021 1:38 PM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>> contribution - some git questions
>>>>
>>>> Correct. For more information about Apache license compatibility:
>>>>
>>>>   https://www.apache.org/legal/resolved.html
>>>>
>>>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>>>> generally only in its binary form. So these top-level dependencies look
>>>> okay, assuming their transitive dependencies are also okay.
>>>>
>>>> We'll also need to verify the licenses of all code in the repo.
>>>> Hopefully little of that is original microsoft MIT and can be granted to
>>>> ASF and relicensed.
>>>>
>>>>
>>>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>>>> The requirement, is that the entire dependency tree (transitively)
>>>> cannot depend on any software that has an Apache-incompatible (aka
>>>> restrictive) license.
>>>>>
>>>>> So we need the transitive closure of all dependencies.
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Adam Rosien <ad...@rosien.net>
>>>>> Sent: Thursday, September 9, 2021 12:44 PM
>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>> contribution - some git questions
>>>>>
>>>>> (I don't understand the requirements of licencing + transitive
>>>>> dependencies, so I'm giving some surface level license info)
>>>>>
>>>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>>>> http://logback.qos.ch/license.html
>>>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>>>> 1.0
>>>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>>>
>>>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>>>>>
>>>>>> I can relay the list of dependencies and their licenses.
>>>>>>
>>>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I personally don't care too much about having the existing git
>> history
>>>>>>> once its part of ASF, especially if it makes thing any easier (as you
>>>>>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>>>>>> just do plan B--create a tarball of the current state (without the
>> git
>>>>>>> history), and the content of that tarball is what goes through the IP
>>>>>>> clearance process, and is the content of the inital commit when
>> adding
>>>>>>> to the apache/daffodil-vscode repo.
>>>>>>>
>>>>>>> Note that I think the incubator will still want access to the
>> existing
>>>>>>> repo so they can view the full git history. Understanding where
>>>>>>> everything came from and verifying the provenance is important to
>>>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>>>> maybe what is officially voted on, they will want access to the repo.
>>>>>>>
>>>>>>> That said, I don't think we are going to get CLA's for any Microsoft
>>>>>>> contribute code. So either all Microsoft contributed code will need
>> to
>>>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>>>>>> grant something to ASF where the original codebase stays MIT and
>> isn't
>>>>>>> part of that grant.
>>>>>>>
>>>>>>> I think understanding how much code still exists that is
>> Microsoft/MIT
>>>>>>> is going to be important to getting this through the IP clearance
>>>> process.
>>>>>>>
>>>>>>> So I'm curious how much of that original Microsoft code still
>> exists? I
>>>>>>> assume since it was just example code it has mostly been replaced? If
>>>>>>> that's the case, we could potentially say Microsoft has no ownership
>> of
>>>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>>>
>>>>>>> We should also have a good understanding of the dependencies. If any
>> of
>>>>>>> them are not compatible with ALv2, then going through this process
>>>> isn't
>>>>>>> even worth it until they are replaced. Do you have a list of the
>>>>>>> dependencies?
>>>>>>>
>>>>>>>
>>>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>>>> So the daffodil-vscode code-base wants to be granted to become part
>> of
>>>>>>> the
>>>>>>>> Daffodil project.
>>>>>>>>
>>>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>>>
>>>>>>>> The normal way this is identified is by creating a tarball of the
>>>>>>> source files
>>>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>>>
>>>>>>>> However, this code base is perhaps different from usual.
>>>>>>>>
>>>>>>>> It started by creating a detached fork of the vscode debugger
>> example
>>>>>>> code base.
>>>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>>>
>>>>>>>> The files are then edited. There are around 100 commits on top of
>> the
>>>>>>> base that
>>>>>>>> came from the vscode debugger repository.
>>>>>>>>
>>>>>>>> So the contribution is that set of 100 commits - the
>>>>>>> patches/change-sets they
>>>>>>>> represent.
>>>>>>>>
>>>>>>>> These commits often edit the original files of the vscode debugger
>>>>>>> example to
>>>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>>>> material is
>>>>>>>> in several cases intermingled in the lines of the existing files.
>>>>>>> That's ok I
>>>>>>>> think so long as the modified file had MIT license.
>>>>>>>>
>>>>>>>> There's some value in preserving the 100 commits by our
>> contributors,
>>>>>>> not
>>>>>>>> squashing it down to one commit, though if it's really not sensible
>> to
>>>>>>> proceed
>>>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>>>
>>>>>>>> Furthermore, the vscode debugger example repo itself had many
>> commits
>>>>>>> in it. The
>>>>>>>> current daffodil-vscode repo preserves all these commits as well. I
>>>>>>> don't see
>>>>>>>> value in preserving these commits, and would rather they were
>> squashed
>>>>>>> into a
>>>>>>>> single "starting point" commit, with a dependencies file specifying
>>>> the
>>>>>>> githash
>>>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>>>
>>>>>>>> So as a starting suggestion (subject to discussion of other
>>>>>>> alternatives) is this:
>>>>>>>>
>>>>>>>> Plan A:
>>>>>>>>
>>>>>>>>  1. squash all commits up to and including the last Microsoft
>> commit,
>>>>>>> together
>>>>>>>>     into one.
>>>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>>>      1. I'm a bit worried about this rebase. There are merge
>> commits,
>>>>>>> etc. in
>>>>>>>>         the history. I'm not sure this will just all rebase while
>>>>>>> preserving all
>>>>>>>>         the commits, but maybe it will "just work"
>>>>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>>>>>> make up the
>>>>>>>>     "contribution".
>>>>>>>>      1. I don't know if this is even feasible for this many commits.
>>>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>>>  5. compute an md5 of this patch set.
>>>>>>>>
>>>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>>>> software".
>>>>>>>>
>>>>>>>> The problem with this idea is that there's no obvious way to review
>> a
>>>>>>> patch set,
>>>>>>>> shy of applying it.
>>>>>>>>
>>>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>>>
>>>>>>>> Plan B:
>>>>>>>>
>>>>>>>>     3. push the main branch to a new empty git repository
>>>>>>>>          The point of this is to remove all historic stuff from the
>>>>>>> repository,
>>>>>>>>     i.e., have a minimal git repo that contains only the
>> contribution
>>>>>>> and the
>>>>>>>>     single other commit it must be based on.
>>>>>>>>
>>>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>>>
>>>>>>>>     5. document that the contribution is from githash X (after the
>>>>>>> first commit)
>>>>>>>>     to githash Y (the final commit) of this repository
>>>>>>>>
>>>>>>>>
>>>>>>>> This has the advantage that the contribution is a self-contained
>>>>>>> review-able thing.
>>>>>>>>
>>>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>>>> know
>>>>>>> of are:
>>>>>>>>
>>>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>>>  2. a sensible way one can review the contents of this contribution
>>>> file
>>>>>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Mike Beckerle | Principal Engineer
>>>>>>>>
>>>>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>>>>>
>>>>>>>> P +1-781-330-0412
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by John Wass <jw...@gmail.com>.

> How hard is it to refactor these 6 files so that all new code is in
separate files from all preserved original code?

Will take a look at this today.

On Fri, Sep 10, 2021 at 1:47 PM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

> How hard is it to refactor these 6 files so that all new code is in
> separate files from all preserved original code?
>
> Assume one-liner changes to original files (like calling MockDebugger
> changed to call DaffodilDebugger) are allowed.
>
> We either have to separate these 6 blended files, or convince legal and
> the incubator-pmc that blended files are ok because they originally had the
> MIT license.
>
> I definitely don't want to bother with that unless the refactoring
> exercise here is hard.
> ________________________________
> From: John Wass <jw...@gmail.com>
> Sent: Friday, September 10, 2021 1:02 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Mike - Those were renames from the original versions that had "mock" in
> their names.
>
> commit 383fd4882a8fe51adf21b5ae31fe252056800447
>
> On Fri, Sep 10, 2021 at 12:54 PM Beckerle, Mike <
> mbeckerle@owlcyberdefense.com> wrote:
>
> >
> > John Wass said:
> >
> > I had a few more (6) source files as modified..
> >
> > extension.ts
> > debugAdapter.ts
> > daffodilRuntime.ts
> > daffodilDebug.ts
> > adapter.test.ts
> > activateDaffodilDebug.ts
> >
> > The 3 files with daffodil or Daffodil in their names, aren't those new
> > files? Or were those based on provided files, but the file was renamed as
> > well as the content modified?
> >
> > ...mikeb
> >
> >
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

How hard is it to refactor these 6 files so that all new code is in separate files from all preserved original code?

Assume one-liner changes to original files (like calling MockDebugger changed to call DaffodilDebugger) are allowed.

We either have to separate these 6 blended files, or convince legal and the incubator-pmc that blended files are ok because they originally had the MIT license.

I definitely don't want to bother with that unless the refactoring exercise here is hard.
________________________________
From: John Wass <jw...@gmail.com>
Sent: Friday, September 10, 2021 1:02 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

Mike - Those were renames from the original versions that had "mock" in
their names.

commit 383fd4882a8fe51adf21b5ae31fe252056800447

On Fri, Sep 10, 2021 at 12:54 PM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

>
> John Wass said:
>
> I had a few more (6) source files as modified..
>
> extension.ts
> debugAdapter.ts
> daffodilRuntime.ts
> daffodilDebug.ts
> adapter.test.ts
> activateDaffodilDebug.ts
>
> The 3 files with daffodil or Daffodil in their names, aren't those new
> files? Or were those based on provided files, but the file was renamed as
> well as the content modified?
>
> ...mikeb
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by John Wass <jw...@gmail.com>.

Mike - Those were renames from the original versions that had "mock" in
their names.

commit 383fd4882a8fe51adf21b5ae31fe252056800447

On Fri, Sep 10, 2021 at 12:54 PM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

>
> John Wass said:
>
> I had a few more (6) source files as modified..
>
> extension.ts
> debugAdapter.ts
> daffodilRuntime.ts
> daffodilDebug.ts
> adapter.test.ts
> activateDaffodilDebug.ts
>
> The 3 files with daffodil or Daffodil in their names, aren't those new
> files? Or were those based on provided files, but the file was renamed as
> well as the content modified?
>
> ...mikeb
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

John Wass said:

I had a few more (6) source files as modified..

extension.ts
debugAdapter.ts
daffodilRuntime.ts
daffodilDebug.ts
adapter.test.ts
activateDaffodilDebug.ts

The 3 files with daffodil or Daffodil in their names, aren't those new files? Or were those based on provided files, but the file was renamed as well as the content modified?

...mikeb

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by John Wass <jw...@gmail.com>.

I had a few more (6) source files as modified..

extension.ts
debugAdapter.ts
daffodilRuntime.ts
daffodilDebug.ts
adapter.test.ts
activateDaffodilDebug.ts

> It would seem an IDE (probably vscode!) decided to restyle/reindent this
code.

We added opinionated code formatting... apparently trying to make this
process as hard as possible :/

That reformat commit was done on 08/25/2021, title of PR was Prettier.
Looking prior to that commit might give a little better idea of what
changed.


> squash/rebase can be difficut through merges

Here is a quick pass on (1) squashing the MS source in a single commit (2)
placing that commit on top of an init commit in a repo (2) and then
rewriting out commits on top of all of that.

It preserves our authorship.  Can be cleaned up a little bit still but I am
not going to put time into it if we don't want this.  I just wanted to note
how it could look.

https://github.com/jw3/rewrite-daffodil-vscode-1

One issue I could see here is the linking of the example repo PR IDs in the
commit messages will conflict once we start adding PRs in the new repo.
Now would be the time to rewrite these commit messages and strip/modify
those #ID tags.

Thoughts on that rewrite repo?





On Thu, Sep 9, 2021 at 5:42 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> So via some git trickery I was able to determine the "blended" files.
>
> I'm ignoring the various configuration files which are generally json
> files.
>
> Of the ".ts" files only 3 are blended:
>
> src/debugAdapter.ts - 72 lines - only maybe 6 lines are different
> src/extension.ts - 179 lines
> src/tests/adapter.test.ts - 137 lines (50 of which are commented-out code)
>
> The delta between these files and the original files of the same name are
> larger than expected due to changes in whitespace, and removal of ";" at
> end of line (which I guess are optional in many places in typescript).
>
> It would seem an IDE (probably vscode!) decided to restyle/reindent this
> code.
>
> So it's a bit hard to figure out what the "real" deltas are.
>
> src/debugAdapter.ts appears to be only trivially different. The name
> MockDebugSession was replaced by DaffodilDebugSession, and "./mockDebug"
> was changed to "./daffodilDebug".
>
> The other two files do appear to be where all the real blended code is.
>
>
>
> ________________________________
> From: Beckerle, Mike <mb...@owlcyberdefense.com>
> Sent: Thursday, September 9, 2021 4:21 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Whether it's a PR or series of PRs, or a software grant, that still
> doesn't resolve the issue of the blended files which are part MIT-licensed
> original code, and part new code deltas by the daffodil-vscode contributors.
>
> We need to understand whether those blended files can be teased apart
> somehow so that it is clear going forward what is an MIT-licensed library
> and what is Apache Licensed.
>
> I just did a grep -R -i microsoft  in a clone of the
> openwhisk-vscode-extension and got zero hits. So no files still carry
> microsoft copyright and in fact their NOTICES.txt file does not indicate
> any dependency on MIT-licensed code at all.  So I think
> openwhisk-vscode-extension is not going to help us figure out how to surf
> this issue.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, September 9, 2021 3:54 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> The concern is that this code was developed outside of Apache and so
> didn't follow standard Apache process. From the IP clearance page:
>
> https://incubator.apache.org/ip-clearance/
>
> > Any code that was developed outside of the ASF SVN repository and
> > our public mailing lists must be processed like this, even if the
> > external developer is already an ASF committer.
>
> I suppose that submitting it as a PR does follow some of that process,
> but there is maybe less assurance of ownership. Because it was not
> developed in an ASF repository, that code is presumed to be owned by
> you, multiple developers, or a company, and so that ownership must be
> granted to ASF via the IP clearance process, with appropriate software
> grant, CLA's, etc. (At least, that's my admittedly limited understanding
> of the process).
>
> - Steve
>
>
> On 9/9/21 3:34 PM, John Wass wrote:
> > Couldn't we (the vscode contributors) submit a series of PRs against the
> > new repo to move the code, and just archive the example repo as-is?
> >
> > I noted some thoughts on that a while back
> > https://github.com/jw3/example-daffodil-vscode/issues/77
> >
> >
> >
> > On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
> mbeckerle@owlcyberdefense.com>
> > wrote:
> >
> >> I know of one file in the repo which will have to be removed which is
> the
> >> jpeg.dfdl.xsd file, which is there just as an example workspace.
> >>
> >> The copyright and provisions of that are not compatible with Apache
> >> licensing.
> >>
> >> We can find a DFDL schema that we created that has Apache license to use
> >> instead.
> >>
> >> For the other files under src, server, and build, can we generate a list
> >> of files identifying which are:
> >>
> >> (a) original MIT-licensed, unmodified
> >> (b) new - can be ASL
> >> (c) blended - started from MIT-licensed source, modified with
> >> daffodil-vscode-specific changes.
> >>
> >> It is these blended files that are the problematic ones.
> >>
> >>
> >>
> >> ________________________________
> >> From: Steve Lawrence <sl...@apache.org>
> >> Sent: Thursday, September 9, 2021 1:38 PM
> >> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >> Subject: Re: daffodil-vscode - how to package and identify the
> >> contribution - some git questions
> >>
> >> Correct. For more information about Apache license compatibility:
> >>
> >>   https://www.apache.org/legal/resolved.html
> >>
> >> MIT is Category A and is fine. EPL is Category B and is also okay, but
> >> generally only in its binary form. So these top-level dependencies look
> >> okay, assuming their transitive dependencies are also okay.
> >>
> >> We'll also need to verify the licenses of all code in the repo.
> >> Hopefully little of that is original microsoft MIT and can be granted to
> >> ASF and relicensed.
> >>
> >>
> >> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> >>> The requirement, is that the entire dependency tree (transitively)
> >> cannot depend on any software that has an Apache-incompatible (aka
> >> restrictive) license.
> >>>
> >>> So we need the transitive closure of all dependencies.
> >>>
> >>>
> >>> ________________________________
> >>> From: Adam Rosien <ad...@rosien.net>
> >>> Sent: Thursday, September 9, 2021 12:44 PM
> >>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>> Subject: Re: daffodil-vscode - how to package and identify the
> >> contribution - some git questions
> >>>
> >>> (I don't understand the requirements of licencing + transitive
> >>> dependencies, so I'm giving some surface level license info)
> >>>
> >>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> >>> http://logback.qos.ch/license.html
> >>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> >> 1.0
> >>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> >>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> >>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >>>
> >>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >>>
> >>>> I can relay the list of dependencies and their licenses.
> >>>>
> >>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I personally don't care too much about having the existing git
> history
> >>>>> once its part of ASF, especially if it makes thing any easier (as you
> >>>>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>>>> just do plan B--create a tarball of the current state (without the
> git
> >>>>> history), and the content of that tarball is what goes through the IP
> >>>>> clearance process, and is the content of the inital commit when
> adding
> >>>>> to the apache/daffodil-vscode repo.
> >>>>>
> >>>>> Note that I think the incubator will still want access to the
> existing
> >>>>> repo so they can view the full git history. Understanding where
> >>>>> everything came from and verifying the provenance is important to
> >>>>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>>>> maybe what is officially voted on, they will want access to the repo.
> >>>>>
> >>>>> That said, I don't think we are going to get CLA's for any Microsoft
> >>>>> contribute code. So either all Microsoft contributed code will need
> to
> >>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
> >>>>> grant something to ASF where the original codebase stays MIT and
> isn't
> >>>>> part of that grant.
> >>>>>
> >>>>> I think understanding how much code still exists that is
> Microsoft/MIT
> >>>>> is going to be important to getting this through the IP clearance
> >> process.
> >>>>>
> >>>>> So I'm curious how much of that original Microsoft code still
> exists? I
> >>>>> assume since it was just example code it has mostly been replaced? If
> >>>>> that's the case, we could potentially say Microsoft has no ownership
> of
> >>>>> this code, and so their CLA and MIT license aren't necessary?
> >>>>>
> >>>>> We should also have a good understanding of the dependencies. If any
> of
> >>>>> them are not compatible with ALv2, then going through this process
> >> isn't
> >>>>> even worth it until they are replaced. Do you have a list of the
> >>>>> dependencies?
> >>>>>
> >>>>>
> >>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>>>> So the daffodil-vscode code-base wants to be granted to become part
> of
> >>>>> the
> >>>>>> Daffodil project.
> >>>>>>
> >>>>>> One question arises which is "what is the contribution?" exactly.
> >>>>>>
> >>>>>> The normal way this is identified is by creating a tarball of the
> >>>>> source files
> >>>>>> and specifying an sha or md5 hash of that file.
> >>>>>>
> >>>>>> However, this code base is perhaps different from usual.
> >>>>>>
> >>>>>> It started by creating a detached fork of the vscode debugger
> example
> >>>>> code base.
> >>>>>> This is MIT-Licensed which is a compatible license.
> >>>>>>
> >>>>>> The files are then edited. There are around 100 commits on top of
> the
> >>>>> base that
> >>>>>> came from the vscode debugger repository.
> >>>>>>
> >>>>>> So the contribution is that set of 100 commits - the
> >>>>> patches/change-sets they
> >>>>>> represent.
> >>>>>>
> >>>>>> These commits often edit the original files of the vscode debugger
> >>>>> example to
> >>>>>> add the daffodil-specific functionality. That is, the contribution
> >>>>> material is
> >>>>>> in several cases intermingled in the lines of the existing files.
> >>>>> That's ok I
> >>>>>> think so long as the modified file had MIT license.
> >>>>>>
> >>>>>> There's some value in preserving the 100 commits by our
> contributors,
> >>>>> not
> >>>>>> squashing it down to one commit, though if it's really not sensible
> to
> >>>>> proceed
> >>>>>> otherwise, we can choose to squash it down to one commit.
> >>>>>>
> >>>>>> Furthermore, the vscode debugger example repo itself had many
> commits
> >>>>> in it. The
> >>>>>> current daffodil-vscode repo preserves all these commits as well. I
> >>>>> don't see
> >>>>>> value in preserving these commits, and would rather they were
> squashed
> >>>>> into a
> >>>>>> single "starting point" commit, with a dependencies file specifying
> >> the
> >>>>> githash
> >>>>>> where we forked from, just so we can refer back if necessary.
> >>>>>>
> >>>>>> So as a starting suggestion (subject to discussion of other
> >>>>> alternatives) is this:
> >>>>>>
> >>>>>> Plan A:
> >>>>>>
> >>>>>>  1. squash all commits up to and including the last Microsoft
> commit,
> >>>>> together
> >>>>>>     into one.
> >>>>>>  2. rebase the remaining commits on top of that.
> >>>>>>      1. I'm a bit worried about this rebase. There are merge
> commits,
> >>>>> etc. in
> >>>>>>         the history. I'm not sure this will just all rebase while
> >>>>> preserving all
> >>>>>>         the commits, but maybe it will "just work"
> >>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
> >>>>> make up the
> >>>>>>     "contribution".
> >>>>>>      1. I don't know if this is even feasible for this many commits.
> >>>>>>  4. create a tar/zip of this aggregate patch set.
> >>>>>>  5. compute an md5 of this patch set.
> >>>>>>
> >>>>>> The patch set tar/zip file and its md5 hash are "the granted
> >> software".
> >>>>>>
> >>>>>> The problem with this idea is that there's no obvious way to review
> a
> >>>>> patch set,
> >>>>>> shy of applying it.
> >>>>>>
> >>>>>> A better way may be to change steps 3 - 5 above to
> >>>>>>
> >>>>>> Plan B:
> >>>>>>
> >>>>>>     3. push the main branch to a new empty git repository
> >>>>>>          The point of this is to remove all historic stuff from the
> >>>>> repository,
> >>>>>>     i.e., have a minimal git repo that contains only the
> contribution
> >>>>> and the
> >>>>>>     single other commit it must be based on.
> >>>>>>
> >>>>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>>>
> >>>>>>     5. document that the contribution is from githash X (after the
> >>>>> first commit)
> >>>>>>     to githash Y (the final commit) of this repository
> >>>>>>
> >>>>>>
> >>>>>> This has the advantage that the contribution is a self-contained
> >>>>> review-able thing.
> >>>>>>
> >>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> >> know
> >>>>> of are:
> >>>>>>
> >>>>>>  1. a single file containing the contribution, and its md5 hash
> >>>>>>  2. a sensible way one can review the contents of this contribution
> >> file
> >>>>>>  3. preserve history of derivation from the vscode debugger example.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Mike Beckerle | Principal Engineer
> >>>>>>
> >>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >>>>>>
> >>>>>> P +1-781-330-0412
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

So via some git trickery I was able to determine the "blended" files.

I'm ignoring the various configuration files which are generally json files.

Of the ".ts" files only 3 are blended:

src/debugAdapter.ts - 72 lines - only maybe 6 lines are different
src/extension.ts - 179 lines
src/tests/adapter.test.ts - 137 lines (50 of which are commented-out code)

The delta between these files and the original files of the same name are larger than expected due to changes in whitespace, and removal of ";" at end of line (which I guess are optional in many places in typescript).

It would seem an IDE (probably vscode!) decided to restyle/reindent this code.

So it's a bit hard to figure out what the "real" deltas are.

src/debugAdapter.ts appears to be only trivially different. The name MockDebugSession was replaced by DaffodilDebugSession, and "./mockDebug" was changed to "./daffodilDebug".

The other two files do appear to be where all the real blended code is.



________________________________
From: Beckerle, Mike <mb...@owlcyberdefense.com>
Sent: Thursday, September 9, 2021 4:21 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

Whether it's a PR or series of PRs, or a software grant, that still doesn't resolve the issue of the blended files which are part MIT-licensed original code, and part new code deltas by the daffodil-vscode contributors.

We need to understand whether those blended files can be teased apart somehow so that it is clear going forward what is an MIT-licensed library and what is Apache Licensed.

I just did a grep -R -i microsoft  in a clone of the openwhisk-vscode-extension and got zero hits. So no files still carry microsoft copyright and in fact their NOTICES.txt file does not indicate any dependency on MIT-licensed code at all.  So I think openwhisk-vscode-extension is not going to help us figure out how to surf this issue.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, September 9, 2021 3:54 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

The concern is that this code was developed outside of Apache and so
didn't follow standard Apache process. From the IP clearance page:

https://incubator.apache.org/ip-clearance/

> Any code that was developed outside of the ASF SVN repository and
> our public mailing lists must be processed like this, even if the
> external developer is already an ASF committer.

I suppose that submitting it as a PR does follow some of that process,
but there is maybe less assurance of ownership. Because it was not
developed in an ASF repository, that code is presumed to be owned by
you, multiple developers, or a company, and so that ownership must be
granted to ASF via the IP clearance process, with appropriate software
grant, CLA's, etc. (At least, that's my admittedly limited understanding
of the process).

- Steve


On 9/9/21 3:34 PM, John Wass wrote:
> Couldn't we (the vscode contributors) submit a series of PRs against the
> new repo to move the code, and just archive the example repo as-is?
>
> I noted some thoughts on that a while back
> https://github.com/jw3/example-daffodil-vscode/issues/77
>
>
>
> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
> wrote:
>
>> I know of one file in the repo which will have to be removed which is the
>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>
>> The copyright and provisions of that are not compatible with Apache
>> licensing.
>>
>> We can find a DFDL schema that we created that has Apache license to use
>> instead.
>>
>> For the other files under src, server, and build, can we generate a list
>> of files identifying which are:
>>
>> (a) original MIT-licensed, unmodified
>> (b) new - can be ASL
>> (c) blended - started from MIT-licensed source, modified with
>> daffodil-vscode-specific changes.
>>
>> It is these blended files that are the problematic ones.
>>
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Thursday, September 9, 2021 1:38 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> Correct. For more information about Apache license compatibility:
>>
>>   https://www.apache.org/legal/resolved.html
>>
>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>> generally only in its binary form. So these top-level dependencies look
>> okay, assuming their transitive dependencies are also okay.
>>
>> We'll also need to verify the licenses of all code in the repo.
>> Hopefully little of that is original microsoft MIT and can be granted to
>> ASF and relicensed.
>>
>>
>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>> The requirement, is that the entire dependency tree (transitively)
>> cannot depend on any software that has an Apache-incompatible (aka
>> restrictive) license.
>>>
>>> So we need the transitive closure of all dependencies.
>>>
>>>
>>> ________________________________
>>> From: Adam Rosien <ad...@rosien.net>
>>> Sent: Thursday, September 9, 2021 12:44 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>>
>>> (I don't understand the requirements of licencing + transitive
>>> dependencies, so I'm giving some surface level license info)
>>>
>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>> http://logback.qos.ch/license.html
>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>> 1.0
>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>
>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>>>
>>>> I can relay the list of dependencies and their licenses.
>>>>
>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>>>> wrote:
>>>>
>>>>> I personally don't care too much about having the existing git history
>>>>> once its part of ASF, especially if it makes thing any easier (as you
>>>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>>>> just do plan B--create a tarball of the current state (without the git
>>>>> history), and the content of that tarball is what goes through the IP
>>>>> clearance process, and is the content of the inital commit when adding
>>>>> to the apache/daffodil-vscode repo.
>>>>>
>>>>> Note that I think the incubator will still want access to the existing
>>>>> repo so they can view the full git history. Understanding where
>>>>> everything came from and verifying the provenance is important to
>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>> maybe what is officially voted on, they will want access to the repo.
>>>>>
>>>>> That said, I don't think we are going to get CLA's for any Microsoft
>>>>> contribute code. So either all Microsoft contributed code will need to
>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>>>> grant something to ASF where the original codebase stays MIT and isn't
>>>>> part of that grant.
>>>>>
>>>>> I think understanding how much code still exists that is Microsoft/MIT
>>>>> is going to be important to getting this through the IP clearance
>> process.
>>>>>
>>>>> So I'm curious how much of that original Microsoft code still exists? I
>>>>> assume since it was just example code it has mostly been replaced? If
>>>>> that's the case, we could potentially say Microsoft has no ownership of
>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>
>>>>> We should also have a good understanding of the dependencies. If any of
>>>>> them are not compatible with ALv2, then going through this process
>> isn't
>>>>> even worth it until they are replaced. Do you have a list of the
>>>>> dependencies?
>>>>>
>>>>>
>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>> So the daffodil-vscode code-base wants to be granted to become part of
>>>>> the
>>>>>> Daffodil project.
>>>>>>
>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>
>>>>>> The normal way this is identified is by creating a tarball of the
>>>>> source files
>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>
>>>>>> However, this code base is perhaps different from usual.
>>>>>>
>>>>>> It started by creating a detached fork of the vscode debugger example
>>>>> code base.
>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>
>>>>>> The files are then edited. There are around 100 commits on top of the
>>>>> base that
>>>>>> came from the vscode debugger repository.
>>>>>>
>>>>>> So the contribution is that set of 100 commits - the
>>>>> patches/change-sets they
>>>>>> represent.
>>>>>>
>>>>>> These commits often edit the original files of the vscode debugger
>>>>> example to
>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>> material is
>>>>>> in several cases intermingled in the lines of the existing files.
>>>>> That's ok I
>>>>>> think so long as the modified file had MIT license.
>>>>>>
>>>>>> There's some value in preserving the 100 commits by our contributors,
>>>>> not
>>>>>> squashing it down to one commit, though if it's really not sensible to
>>>>> proceed
>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>
>>>>>> Furthermore, the vscode debugger example repo itself had many commits
>>>>> in it. The
>>>>>> current daffodil-vscode repo preserves all these commits as well. I
>>>>> don't see
>>>>>> value in preserving these commits, and would rather they were squashed
>>>>> into a
>>>>>> single "starting point" commit, with a dependencies file specifying
>> the
>>>>> githash
>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>
>>>>>> So as a starting suggestion (subject to discussion of other
>>>>> alternatives) is this:
>>>>>>
>>>>>> Plan A:
>>>>>>
>>>>>>  1. squash all commits up to and including the last Microsoft commit,
>>>>> together
>>>>>>     into one.
>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>      1. I'm a bit worried about this rebase. There are merge commits,
>>>>> etc. in
>>>>>>         the history. I'm not sure this will just all rebase while
>>>>> preserving all
>>>>>>         the commits, but maybe it will "just work"
>>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>>>> make up the
>>>>>>     "contribution".
>>>>>>      1. I don't know if this is even feasible for this many commits.
>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>  5. compute an md5 of this patch set.
>>>>>>
>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>> software".
>>>>>>
>>>>>> The problem with this idea is that there's no obvious way to review a
>>>>> patch set,
>>>>>> shy of applying it.
>>>>>>
>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>
>>>>>> Plan B:
>>>>>>
>>>>>>     3. push the main branch to a new empty git repository
>>>>>>          The point of this is to remove all historic stuff from the
>>>>> repository,
>>>>>>     i.e., have a minimal git repo that contains only the contribution
>>>>> and the
>>>>>>     single other commit it must be based on.
>>>>>>
>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>
>>>>>>     5. document that the contribution is from githash X (after the
>>>>> first commit)
>>>>>>     to githash Y (the final commit) of this repository
>>>>>>
>>>>>>
>>>>>> This has the advantage that the contribution is a self-contained
>>>>> review-able thing.
>>>>>>
>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>> know
>>>>> of are:
>>>>>>
>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>  2. a sensible way one can review the contents of this contribution
>> file
>>>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mike Beckerle | Principal Engineer
>>>>>>
>>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>>>
>>>>>> P +1-781-330-0412
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

Whether it's a PR or series of PRs, or a software grant, that still doesn't resolve the issue of the blended files which are part MIT-licensed original code, and part new code deltas by the daffodil-vscode contributors.

We need to understand whether those blended files can be teased apart somehow so that it is clear going forward what is an MIT-licensed library and what is Apache Licensed.

I just did a grep -R -i microsoft  in a clone of the openwhisk-vscode-extension and got zero hits. So no files still carry microsoft copyright and in fact their NOTICES.txt file does not indicate any dependency on MIT-licensed code at all.  So I think openwhisk-vscode-extension is not going to help us figure out how to surf this issue.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, September 9, 2021 3:54 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

The concern is that this code was developed outside of Apache and so
didn't follow standard Apache process. From the IP clearance page:

https://incubator.apache.org/ip-clearance/

> Any code that was developed outside of the ASF SVN repository and
> our public mailing lists must be processed like this, even if the
> external developer is already an ASF committer.

I suppose that submitting it as a PR does follow some of that process,
but there is maybe less assurance of ownership. Because it was not
developed in an ASF repository, that code is presumed to be owned by
you, multiple developers, or a company, and so that ownership must be
granted to ASF via the IP clearance process, with appropriate software
grant, CLA's, etc. (At least, that's my admittedly limited understanding
of the process).

- Steve


On 9/9/21 3:34 PM, John Wass wrote:
> Couldn't we (the vscode contributors) submit a series of PRs against the
> new repo to move the code, and just archive the example repo as-is?
>
> I noted some thoughts on that a while back
> https://github.com/jw3/example-daffodil-vscode/issues/77
>
>
>
> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
> wrote:
>
>> I know of one file in the repo which will have to be removed which is the
>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>
>> The copyright and provisions of that are not compatible with Apache
>> licensing.
>>
>> We can find a DFDL schema that we created that has Apache license to use
>> instead.
>>
>> For the other files under src, server, and build, can we generate a list
>> of files identifying which are:
>>
>> (a) original MIT-licensed, unmodified
>> (b) new - can be ASL
>> (c) blended - started from MIT-licensed source, modified with
>> daffodil-vscode-specific changes.
>>
>> It is these blended files that are the problematic ones.
>>
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Thursday, September 9, 2021 1:38 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> Correct. For more information about Apache license compatibility:
>>
>>   https://www.apache.org/legal/resolved.html
>>
>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>> generally only in its binary form. So these top-level dependencies look
>> okay, assuming their transitive dependencies are also okay.
>>
>> We'll also need to verify the licenses of all code in the repo.
>> Hopefully little of that is original microsoft MIT and can be granted to
>> ASF and relicensed.
>>
>>
>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>> The requirement, is that the entire dependency tree (transitively)
>> cannot depend on any software that has an Apache-incompatible (aka
>> restrictive) license.
>>>
>>> So we need the transitive closure of all dependencies.
>>>
>>>
>>> ________________________________
>>> From: Adam Rosien <ad...@rosien.net>
>>> Sent: Thursday, September 9, 2021 12:44 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>>
>>> (I don't understand the requirements of licencing + transitive
>>> dependencies, so I'm giving some surface level license info)
>>>
>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>> http://logback.qos.ch/license.html
>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>> 1.0
>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>
>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>>>
>>>> I can relay the list of dependencies and their licenses.
>>>>
>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>>>> wrote:
>>>>
>>>>> I personally don't care too much about having the existing git history
>>>>> once its part of ASF, especially if it makes thing any easier (as you
>>>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>>>> just do plan B--create a tarball of the current state (without the git
>>>>> history), and the content of that tarball is what goes through the IP
>>>>> clearance process, and is the content of the inital commit when adding
>>>>> to the apache/daffodil-vscode repo.
>>>>>
>>>>> Note that I think the incubator will still want access to the existing
>>>>> repo so they can view the full git history. Understanding where
>>>>> everything came from and verifying the provenance is important to
>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>> maybe what is officially voted on, they will want access to the repo.
>>>>>
>>>>> That said, I don't think we are going to get CLA's for any Microsoft
>>>>> contribute code. So either all Microsoft contributed code will need to
>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>>>> grant something to ASF where the original codebase stays MIT and isn't
>>>>> part of that grant.
>>>>>
>>>>> I think understanding how much code still exists that is Microsoft/MIT
>>>>> is going to be important to getting this through the IP clearance
>> process.
>>>>>
>>>>> So I'm curious how much of that original Microsoft code still exists? I
>>>>> assume since it was just example code it has mostly been replaced? If
>>>>> that's the case, we could potentially say Microsoft has no ownership of
>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>
>>>>> We should also have a good understanding of the dependencies. If any of
>>>>> them are not compatible with ALv2, then going through this process
>> isn't
>>>>> even worth it until they are replaced. Do you have a list of the
>>>>> dependencies?
>>>>>
>>>>>
>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>> So the daffodil-vscode code-base wants to be granted to become part of
>>>>> the
>>>>>> Daffodil project.
>>>>>>
>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>
>>>>>> The normal way this is identified is by creating a tarball of the
>>>>> source files
>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>
>>>>>> However, this code base is perhaps different from usual.
>>>>>>
>>>>>> It started by creating a detached fork of the vscode debugger example
>>>>> code base.
>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>
>>>>>> The files are then edited. There are around 100 commits on top of the
>>>>> base that
>>>>>> came from the vscode debugger repository.
>>>>>>
>>>>>> So the contribution is that set of 100 commits - the
>>>>> patches/change-sets they
>>>>>> represent.
>>>>>>
>>>>>> These commits often edit the original files of the vscode debugger
>>>>> example to
>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>> material is
>>>>>> in several cases intermingled in the lines of the existing files.
>>>>> That's ok I
>>>>>> think so long as the modified file had MIT license.
>>>>>>
>>>>>> There's some value in preserving the 100 commits by our contributors,
>>>>> not
>>>>>> squashing it down to one commit, though if it's really not sensible to
>>>>> proceed
>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>
>>>>>> Furthermore, the vscode debugger example repo itself had many commits
>>>>> in it. The
>>>>>> current daffodil-vscode repo preserves all these commits as well. I
>>>>> don't see
>>>>>> value in preserving these commits, and would rather they were squashed
>>>>> into a
>>>>>> single "starting point" commit, with a dependencies file specifying
>> the
>>>>> githash
>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>
>>>>>> So as a starting suggestion (subject to discussion of other
>>>>> alternatives) is this:
>>>>>>
>>>>>> Plan A:
>>>>>>
>>>>>>  1. squash all commits up to and including the last Microsoft commit,
>>>>> together
>>>>>>     into one.
>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>      1. I'm a bit worried about this rebase. There are merge commits,
>>>>> etc. in
>>>>>>         the history. I'm not sure this will just all rebase while
>>>>> preserving all
>>>>>>         the commits, but maybe it will "just work"
>>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>>>> make up the
>>>>>>     "contribution".
>>>>>>      1. I don't know if this is even feasible for this many commits.
>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>  5. compute an md5 of this patch set.
>>>>>>
>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>> software".
>>>>>>
>>>>>> The problem with this idea is that there's no obvious way to review a
>>>>> patch set,
>>>>>> shy of applying it.
>>>>>>
>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>
>>>>>> Plan B:
>>>>>>
>>>>>>     3. push the main branch to a new empty git repository
>>>>>>          The point of this is to remove all historic stuff from the
>>>>> repository,
>>>>>>     i.e., have a minimal git repo that contains only the contribution
>>>>> and the
>>>>>>     single other commit it must be based on.
>>>>>>
>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>
>>>>>>     5. document that the contribution is from githash X (after the
>>>>> first commit)
>>>>>>     to githash Y (the final commit) of this repository
>>>>>>
>>>>>>
>>>>>> This has the advantage that the contribution is a self-contained
>>>>> review-able thing.
>>>>>>
>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>> know
>>>>> of are:
>>>>>>
>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>  2. a sensible way one can review the contents of this contribution
>> file
>>>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mike Beckerle | Principal Engineer
>>>>>>
>>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>>>
>>>>>> P +1-781-330-0412
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

Yeah, I think having accurate license/copyright headers on all the files
will make it much easier to determine the ownership and ASF
acceptability, especially for the ASF group that must approve the IP
clearance process.

On 9/10/21 12:34 PM, Adam Rosien wrote:
> Would it help if we developers added copyright information to the repo? Or
> would it be redundant. (Most projects start out without explicit copyright
> notices anyway, and add them after some period of time).
> 
> .. Adam
> 
> On Fri, Sep 10, 2021 at 4:42 AM Steve Lawrence <sl...@apache.org> wrote:
> 
>> I think the concern from ASF, and why they have this IP clearance
>> process, is that the copyright ownership of these files is not clear. It
>> wasn't done in a fork of a Daffodil repo, and there are contributions
>> from multiple developers, with little public oversight from Daffodil. I
>> think from the ASF's perspective, this code did not follow the ASF
>> process, and is assumed to be owned by the contributors or their
>> companies. There is no license/copyright in the code specifying
>> otherwise, so I think ASF must assume the worst, and require a software
>> grant.
>>
>> Also note that even prototype code has a copyright owner and a license.
>> Copying it into a PR doesn't change that. If you were to throw away this
>> code and start from scratch following the ASF process, then it wouldn't
>> be a problem. But if the plan is to copy prototype code not owned by ASF
>> into a PR, then there are ownership concerns.
>>
>> If all this work was done in a fork of the apache/daffodil-vscode repo
>> from a single contributor, then I think maybe the assumption from ASF is
>> the code was intended to be part of the main repo and implicitly granted
>> to ASF via the PR process.
>>
>>
>> On 9/9/21 4:05 PM, John Wass wrote:
>>> Yeah I was thinking of the example repo as a prototype, just as if I was
>>> working on a feature in my fork of Daffodil.  The main project doesn't
>> own
>>> the feature until it crosses the PR threshold, and once it does cross
>> over
>>> the state of my fork is of no concern to it.
>>>
>>>
>>>
>>> On Thu, Sep 9, 2021 at 3:54 PM Steve Lawrence <sl...@apache.org>
>> wrote:
>>>
>>>> The concern is that this code was developed outside of Apache and so
>>>> didn't follow standard Apache process. From the IP clearance page:
>>>>
>>>> https://incubator.apache.org/ip-clearance/
>>>>
>>>>> Any code that was developed outside of the ASF SVN repository and
>>>>> our public mailing lists must be processed like this, even if the
>>>>> external developer is already an ASF committer.
>>>>
>>>> I suppose that submitting it as a PR does follow some of that process,
>>>> but there is maybe less assurance of ownership. Because it was not
>>>> developed in an ASF repository, that code is presumed to be owned by
>>>> you, multiple developers, or a company, and so that ownership must be
>>>> granted to ASF via the IP clearance process, with appropriate software
>>>> grant, CLA's, etc. (At least, that's my admittedly limited understanding
>>>> of the process).
>>>>
>>>> - Steve
>>>>
>>>>
>>>> On 9/9/21 3:34 PM, John Wass wrote:
>>>>> Couldn't we (the vscode contributors) submit a series of PRs against
>> the
>>>>> new repo to move the code, and just archive the example repo as-is?
>>>>>
>>>>> I noted some thoughts on that a while back
>>>>> https://github.com/jw3/example-daffodil-vscode/issues/77
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
>>>> mbeckerle@owlcyberdefense.com>
>>>>> wrote:
>>>>>
>>>>>> I know of one file in the repo which will have to be removed which is
>>>> the
>>>>>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>>>>>
>>>>>> The copyright and provisions of that are not compatible with Apache
>>>>>> licensing.
>>>>>>
>>>>>> We can find a DFDL schema that we created that has Apache license to
>> use
>>>>>> instead.
>>>>>>
>>>>>> For the other files under src, server, and build, can we generate a
>> list
>>>>>> of files identifying which are:
>>>>>>
>>>>>> (a) original MIT-licensed, unmodified
>>>>>> (b) new - can be ASL
>>>>>> (c) blended - started from MIT-licensed source, modified with
>>>>>> daffodil-vscode-specific changes.
>>>>>>
>>>>>> It is these blended files that are the problematic ones.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Steve Lawrence <sl...@apache.org>
>>>>>> Sent: Thursday, September 9, 2021 1:38 PM
>>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>>>> contribution - some git questions
>>>>>>
>>>>>> Correct. For more information about Apache license compatibility:
>>>>>>
>>>>>>   https://www.apache.org/legal/resolved.html
>>>>>>
>>>>>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>>>>>> generally only in its binary form. So these top-level dependencies
>> look
>>>>>> okay, assuming their transitive dependencies are also okay.
>>>>>>
>>>>>> We'll also need to verify the licenses of all code in the repo.
>>>>>> Hopefully little of that is original microsoft MIT and can be granted
>> to
>>>>>> ASF and relicensed.
>>>>>>
>>>>>>
>>>>>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>>>>>> The requirement, is that the entire dependency tree (transitively)
>>>>>> cannot depend on any software that has an Apache-incompatible (aka
>>>>>> restrictive) license.
>>>>>>>
>>>>>>> So we need the transitive closure of all dependencies.
>>>>>>>
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Adam Rosien <ad...@rosien.net>
>>>>>>> Sent: Thursday, September 9, 2021 12:44 PM
>>>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>>>> contribution - some git questions
>>>>>>>
>>>>>>> (I don't understand the requirements of licencing + transitive
>>>>>>> dependencies, so I'm giving some surface level license info)
>>>>>>>
>>>>>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>>>>>> http://logback.qos.ch/license.html
>>>>>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" -
>> EPL
>>>>>> 1.0
>>>>>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>>>>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>>>>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>>>>>
>>>>>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>>>>>>>
>>>>>>>> I can relay the list of dependencies and their licenses.
>>>>>>>>
>>>>>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <slawrence@apache.org
>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I personally don't care too much about having the existing git
>>>> history
>>>>>>>>> once its part of ASF, especially if it makes thing any easier (as
>> you
>>>>>>>>> mention, squash/rebase can be difficut through merges). So I'd say
>> we
>>>>>>>>> just do plan B--create a tarball of the current state (without the
>>>> git
>>>>>>>>> history), and the content of that tarball is what goes through the
>> IP
>>>>>>>>> clearance process, and is the content of the inital commit when
>>>> adding
>>>>>>>>> to the apache/daffodil-vscode repo.
>>>>>>>>>
>>>>>>>>> Note that I think the incubator will still want access to the
>>>> existing
>>>>>>>>> repo so they can view the full git history. Understanding where
>>>>>>>>> everything came from and verifying the provenance is important to
>>>>>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>>>>>> maybe what is officially voted on, they will want access to the
>> repo.
>>>>>>>>>
>>>>>>>>> That said, I don't think we are going to get CLA's for any
>> Microsoft
>>>>>>>>> contribute code. So either all Microsoft contributed code will need
>>>> to
>>>>>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd
>> to
>>>>>>>>> grant something to ASF where the original codebase stays MIT and
>>>> isn't
>>>>>>>>> part of that grant.
>>>>>>>>>
>>>>>>>>> I think understanding how much code still exists that is
>>>> Microsoft/MIT
>>>>>>>>> is going to be important to getting this through the IP clearance
>>>>>> process.
>>>>>>>>>
>>>>>>>>> So I'm curious how much of that original Microsoft code still
>>>> exists? I
>>>>>>>>> assume since it was just example code it has mostly been replaced?
>> If
>>>>>>>>> that's the case, we could potentially say Microsoft has no
>> ownership
>>>> of
>>>>>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>>>>>
>>>>>>>>> We should also have a good understanding of the dependencies. If
>> any
>>>> of
>>>>>>>>> them are not compatible with ALv2, then going through this process
>>>>>> isn't
>>>>>>>>> even worth it until they are replaced. Do you have a list of the
>>>>>>>>> dependencies?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>>>>>> So the daffodil-vscode code-base wants to be granted to become
>> part
>>>> of
>>>>>>>>> the
>>>>>>>>>> Daffodil project.
>>>>>>>>>>
>>>>>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>>>>>
>>>>>>>>>> The normal way this is identified is by creating a tarball of the
>>>>>>>>> source files
>>>>>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>>>>>
>>>>>>>>>> However, this code base is perhaps different from usual.
>>>>>>>>>>
>>>>>>>>>> It started by creating a detached fork of the vscode debugger
>>>> example
>>>>>>>>> code base.
>>>>>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>>>>>
>>>>>>>>>> The files are then edited. There are around 100 commits on top of
>>>> the
>>>>>>>>> base that
>>>>>>>>>> came from the vscode debugger repository.
>>>>>>>>>>
>>>>>>>>>> So the contribution is that set of 100 commits - the
>>>>>>>>> patches/change-sets they
>>>>>>>>>> represent.
>>>>>>>>>>
>>>>>>>>>> These commits often edit the original files of the vscode debugger
>>>>>>>>> example to
>>>>>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>>>>>> material is
>>>>>>>>>> in several cases intermingled in the lines of the existing files.
>>>>>>>>> That's ok I
>>>>>>>>>> think so long as the modified file had MIT license.
>>>>>>>>>>
>>>>>>>>>> There's some value in preserving the 100 commits by our
>>>> contributors,
>>>>>>>>> not
>>>>>>>>>> squashing it down to one commit, though if it's really not
>> sensible
>>>> to
>>>>>>>>> proceed
>>>>>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>>>>>
>>>>>>>>>> Furthermore, the vscode debugger example repo itself had many
>>>> commits
>>>>>>>>> in it. The
>>>>>>>>>> current daffodil-vscode repo preserves all these commits as well.
>> I
>>>>>>>>> don't see
>>>>>>>>>> value in preserving these commits, and would rather they were
>>>> squashed
>>>>>>>>> into a
>>>>>>>>>> single "starting point" commit, with a dependencies file
>> specifying
>>>>>> the
>>>>>>>>> githash
>>>>>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>>>>>
>>>>>>>>>> So as a starting suggestion (subject to discussion of other
>>>>>>>>> alternatives) is this:
>>>>>>>>>>
>>>>>>>>>> Plan A:
>>>>>>>>>>
>>>>>>>>>>  1. squash all commits up to and including the last Microsoft
>>>> commit,
>>>>>>>>> together
>>>>>>>>>>     into one.
>>>>>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>>>>>      1. I'm a bit worried about this rebase. There are merge
>>>> commits,
>>>>>>>>> etc. in
>>>>>>>>>>         the history. I'm not sure this will just all rebase while
>>>>>>>>> preserving all
>>>>>>>>>>         the commits, but maybe it will "just work"
>>>>>>>>>>  3. create a "patch set" corresponding to the 100 or so commits
>> that
>>>>>>>>> make up the
>>>>>>>>>>     "contribution".
>>>>>>>>>>      1. I don't know if this is even feasible for this many
>> commits.
>>>>>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>>>>>  5. compute an md5 of this patch set.
>>>>>>>>>>
>>>>>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>>>>>> software".
>>>>>>>>>>
>>>>>>>>>> The problem with this idea is that there's no obvious way to
>> review
>>>> a
>>>>>>>>> patch set,
>>>>>>>>>> shy of applying it.
>>>>>>>>>>
>>>>>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>>>>>
>>>>>>>>>> Plan B:
>>>>>>>>>>
>>>>>>>>>>     3. push the main branch to a new empty git repository
>>>>>>>>>>          The point of this is to remove all historic stuff from
>> the
>>>>>>>>> repository,
>>>>>>>>>>     i.e., have a minimal git repo that contains only the
>>>> contribution
>>>>>>>>> and the
>>>>>>>>>>     single other commit it must be based on.
>>>>>>>>>>
>>>>>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>>>>>
>>>>>>>>>>     5. document that the contribution is from githash X (after the
>>>>>>>>> first commit)
>>>>>>>>>>     to githash Y (the final commit) of this repository
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This has the advantage that the contribution is a self-contained
>>>>>>>>> review-able thing.
>>>>>>>>>>
>>>>>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>>>>>> know
>>>>>>>>> of are:
>>>>>>>>>>
>>>>>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>>>>>  2. a sensible way one can review the contents of this
>> contribution
>>>>>> file
>>>>>>>>>>  3. preserve history of derivation from the vscode debugger
>> example.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mike Beckerle | Principal Engineer
>>>>>>>>>>
>>>>>>>>>> mbeckerle@owlcyberdefense.com <mailto:bhummel@owlcyberdefense.com
>>>
>>>>>>>>>>
>>>>>>>>>> P +1-781-330-0412
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Adam Rosien <ad...@rosien.net>.

Would it help if we developers added copyright information to the repo? Or
would it be redundant. (Most projects start out without explicit copyright
notices anyway, and add them after some period of time).

.. Adam

On Fri, Sep 10, 2021 at 4:42 AM Steve Lawrence <sl...@apache.org> wrote:

> I think the concern from ASF, and why they have this IP clearance
> process, is that the copyright ownership of these files is not clear. It
> wasn't done in a fork of a Daffodil repo, and there are contributions
> from multiple developers, with little public oversight from Daffodil. I
> think from the ASF's perspective, this code did not follow the ASF
> process, and is assumed to be owned by the contributors or their
> companies. There is no license/copyright in the code specifying
> otherwise, so I think ASF must assume the worst, and require a software
> grant.
>
> Also note that even prototype code has a copyright owner and a license.
> Copying it into a PR doesn't change that. If you were to throw away this
> code and start from scratch following the ASF process, then it wouldn't
> be a problem. But if the plan is to copy prototype code not owned by ASF
> into a PR, then there are ownership concerns.
>
> If all this work was done in a fork of the apache/daffodil-vscode repo
> from a single contributor, then I think maybe the assumption from ASF is
> the code was intended to be part of the main repo and implicitly granted
> to ASF via the PR process.
>
>
> On 9/9/21 4:05 PM, John Wass wrote:
> > Yeah I was thinking of the example repo as a prototype, just as if I was
> > working on a feature in my fork of Daffodil.  The main project doesn't
> own
> > the feature until it crosses the PR threshold, and once it does cross
> over
> > the state of my fork is of no concern to it.
> >
> >
> >
> > On Thu, Sep 9, 2021 at 3:54 PM Steve Lawrence <sl...@apache.org>
> wrote:
> >
> >> The concern is that this code was developed outside of Apache and so
> >> didn't follow standard Apache process. From the IP clearance page:
> >>
> >> https://incubator.apache.org/ip-clearance/
> >>
> >>> Any code that was developed outside of the ASF SVN repository and
> >>> our public mailing lists must be processed like this, even if the
> >>> external developer is already an ASF committer.
> >>
> >> I suppose that submitting it as a PR does follow some of that process,
> >> but there is maybe less assurance of ownership. Because it was not
> >> developed in an ASF repository, that code is presumed to be owned by
> >> you, multiple developers, or a company, and so that ownership must be
> >> granted to ASF via the IP clearance process, with appropriate software
> >> grant, CLA's, etc. (At least, that's my admittedly limited understanding
> >> of the process).
> >>
> >> - Steve
> >>
> >>
> >> On 9/9/21 3:34 PM, John Wass wrote:
> >>> Couldn't we (the vscode contributors) submit a series of PRs against
> the
> >>> new repo to move the code, and just archive the example repo as-is?
> >>>
> >>> I noted some thoughts on that a while back
> >>> https://github.com/jw3/example-daffodil-vscode/issues/77
> >>>
> >>>
> >>>
> >>> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
> >> mbeckerle@owlcyberdefense.com>
> >>> wrote:
> >>>
> >>>> I know of one file in the repo which will have to be removed which is
> >> the
> >>>> jpeg.dfdl.xsd file, which is there just as an example workspace.
> >>>>
> >>>> The copyright and provisions of that are not compatible with Apache
> >>>> licensing.
> >>>>
> >>>> We can find a DFDL schema that we created that has Apache license to
> use
> >>>> instead.
> >>>>
> >>>> For the other files under src, server, and build, can we generate a
> list
> >>>> of files identifying which are:
> >>>>
> >>>> (a) original MIT-licensed, unmodified
> >>>> (b) new - can be ASL
> >>>> (c) blended - started from MIT-licensed source, modified with
> >>>> daffodil-vscode-specific changes.
> >>>>
> >>>> It is these blended files that are the problematic ones.
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>> From: Steve Lawrence <sl...@apache.org>
> >>>> Sent: Thursday, September 9, 2021 1:38 PM
> >>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>>> Subject: Re: daffodil-vscode - how to package and identify the
> >>>> contribution - some git questions
> >>>>
> >>>> Correct. For more information about Apache license compatibility:
> >>>>
> >>>>   https://www.apache.org/legal/resolved.html
> >>>>
> >>>> MIT is Category A and is fine. EPL is Category B and is also okay, but
> >>>> generally only in its binary form. So these top-level dependencies
> look
> >>>> okay, assuming their transitive dependencies are also okay.
> >>>>
> >>>> We'll also need to verify the licenses of all code in the repo.
> >>>> Hopefully little of that is original microsoft MIT and can be granted
> to
> >>>> ASF and relicensed.
> >>>>
> >>>>
> >>>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> >>>>> The requirement, is that the entire dependency tree (transitively)
> >>>> cannot depend on any software that has an Apache-incompatible (aka
> >>>> restrictive) license.
> >>>>>
> >>>>> So we need the transitive closure of all dependencies.
> >>>>>
> >>>>>
> >>>>> ________________________________
> >>>>> From: Adam Rosien <ad...@rosien.net>
> >>>>> Sent: Thursday, September 9, 2021 12:44 PM
> >>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>>>> Subject: Re: daffodil-vscode - how to package and identify the
> >>>> contribution - some git questions
> >>>>>
> >>>>> (I don't understand the requirements of licencing + transitive
> >>>>> dependencies, so I'm giving some surface level license info)
> >>>>>
> >>>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> >>>>> http://logback.qos.ch/license.html
> >>>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" -
> EPL
> >>>> 1.0
> >>>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> >>>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> >>>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >>>>>
> >>>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >>>>>
> >>>>>> I can relay the list of dependencies and their licenses.
> >>>>>>
> >>>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <slawrence@apache.org
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I personally don't care too much about having the existing git
> >> history
> >>>>>>> once its part of ASF, especially if it makes thing any easier (as
> you
> >>>>>>> mention, squash/rebase can be difficut through merges). So I'd say
> we
> >>>>>>> just do plan B--create a tarball of the current state (without the
> >> git
> >>>>>>> history), and the content of that tarball is what goes through the
> IP
> >>>>>>> clearance process, and is the content of the inital commit when
> >> adding
> >>>>>>> to the apache/daffodil-vscode repo.
> >>>>>>>
> >>>>>>> Note that I think the incubator will still want access to the
> >> existing
> >>>>>>> repo so they can view the full git history. Understanding where
> >>>>>>> everything came from and verifying the provenance is important to
> >>>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>>>>>> maybe what is officially voted on, they will want access to the
> repo.
> >>>>>>>
> >>>>>>> That said, I don't think we are going to get CLA's for any
> Microsoft
> >>>>>>> contribute code. So either all Microsoft contributed code will need
> >> to
> >>>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd
> to
> >>>>>>> grant something to ASF where the original codebase stays MIT and
> >> isn't
> >>>>>>> part of that grant.
> >>>>>>>
> >>>>>>> I think understanding how much code still exists that is
> >> Microsoft/MIT
> >>>>>>> is going to be important to getting this through the IP clearance
> >>>> process.
> >>>>>>>
> >>>>>>> So I'm curious how much of that original Microsoft code still
> >> exists? I
> >>>>>>> assume since it was just example code it has mostly been replaced?
> If
> >>>>>>> that's the case, we could potentially say Microsoft has no
> ownership
> >> of
> >>>>>>> this code, and so their CLA and MIT license aren't necessary?
> >>>>>>>
> >>>>>>> We should also have a good understanding of the dependencies. If
> any
> >> of
> >>>>>>> them are not compatible with ALv2, then going through this process
> >>>> isn't
> >>>>>>> even worth it until they are replaced. Do you have a list of the
> >>>>>>> dependencies?
> >>>>>>>
> >>>>>>>
> >>>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>>>>>> So the daffodil-vscode code-base wants to be granted to become
> part
> >> of
> >>>>>>> the
> >>>>>>>> Daffodil project.
> >>>>>>>>
> >>>>>>>> One question arises which is "what is the contribution?" exactly.
> >>>>>>>>
> >>>>>>>> The normal way this is identified is by creating a tarball of the
> >>>>>>> source files
> >>>>>>>> and specifying an sha or md5 hash of that file.
> >>>>>>>>
> >>>>>>>> However, this code base is perhaps different from usual.
> >>>>>>>>
> >>>>>>>> It started by creating a detached fork of the vscode debugger
> >> example
> >>>>>>> code base.
> >>>>>>>> This is MIT-Licensed which is a compatible license.
> >>>>>>>>
> >>>>>>>> The files are then edited. There are around 100 commits on top of
> >> the
> >>>>>>> base that
> >>>>>>>> came from the vscode debugger repository.
> >>>>>>>>
> >>>>>>>> So the contribution is that set of 100 commits - the
> >>>>>>> patches/change-sets they
> >>>>>>>> represent.
> >>>>>>>>
> >>>>>>>> These commits often edit the original files of the vscode debugger
> >>>>>>> example to
> >>>>>>>> add the daffodil-specific functionality. That is, the contribution
> >>>>>>> material is
> >>>>>>>> in several cases intermingled in the lines of the existing files.
> >>>>>>> That's ok I
> >>>>>>>> think so long as the modified file had MIT license.
> >>>>>>>>
> >>>>>>>> There's some value in preserving the 100 commits by our
> >> contributors,
> >>>>>>> not
> >>>>>>>> squashing it down to one commit, though if it's really not
> sensible
> >> to
> >>>>>>> proceed
> >>>>>>>> otherwise, we can choose to squash it down to one commit.
> >>>>>>>>
> >>>>>>>> Furthermore, the vscode debugger example repo itself had many
> >> commits
> >>>>>>> in it. The
> >>>>>>>> current daffodil-vscode repo preserves all these commits as well.
> I
> >>>>>>> don't see
> >>>>>>>> value in preserving these commits, and would rather they were
> >> squashed
> >>>>>>> into a
> >>>>>>>> single "starting point" commit, with a dependencies file
> specifying
> >>>> the
> >>>>>>> githash
> >>>>>>>> where we forked from, just so we can refer back if necessary.
> >>>>>>>>
> >>>>>>>> So as a starting suggestion (subject to discussion of other
> >>>>>>> alternatives) is this:
> >>>>>>>>
> >>>>>>>> Plan A:
> >>>>>>>>
> >>>>>>>>  1. squash all commits up to and including the last Microsoft
> >> commit,
> >>>>>>> together
> >>>>>>>>     into one.
> >>>>>>>>  2. rebase the remaining commits on top of that.
> >>>>>>>>      1. I'm a bit worried about this rebase. There are merge
> >> commits,
> >>>>>>> etc. in
> >>>>>>>>         the history. I'm not sure this will just all rebase while
> >>>>>>> preserving all
> >>>>>>>>         the commits, but maybe it will "just work"
> >>>>>>>>  3. create a "patch set" corresponding to the 100 or so commits
> that
> >>>>>>> make up the
> >>>>>>>>     "contribution".
> >>>>>>>>      1. I don't know if this is even feasible for this many
> commits.
> >>>>>>>>  4. create a tar/zip of this aggregate patch set.
> >>>>>>>>  5. compute an md5 of this patch set.
> >>>>>>>>
> >>>>>>>> The patch set tar/zip file and its md5 hash are "the granted
> >>>> software".
> >>>>>>>>
> >>>>>>>> The problem with this idea is that there's no obvious way to
> review
> >> a
> >>>>>>> patch set,
> >>>>>>>> shy of applying it.
> >>>>>>>>
> >>>>>>>> A better way may be to change steps 3 - 5 above to
> >>>>>>>>
> >>>>>>>> Plan B:
> >>>>>>>>
> >>>>>>>>     3. push the main branch to a new empty git repository
> >>>>>>>>          The point of this is to remove all historic stuff from
> the
> >>>>>>> repository,
> >>>>>>>>     i.e., have a minimal git repo that contains only the
> >> contribution
> >>>>>>> and the
> >>>>>>>>     single other commit it must be based on.
> >>>>>>>>
> >>>>>>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>>>>>
> >>>>>>>>     5. document that the contribution is from githash X (after the
> >>>>>>> first commit)
> >>>>>>>>     to githash Y (the final commit) of this repository
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> This has the advantage that the contribution is a self-contained
> >>>>>>> review-able thing.
> >>>>>>>>
> >>>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> >>>> know
> >>>>>>> of are:
> >>>>>>>>
> >>>>>>>>  1. a single file containing the contribution, and its md5 hash
> >>>>>>>>  2. a sensible way one can review the contents of this
> contribution
> >>>> file
> >>>>>>>>  3. preserve history of derivation from the vscode debugger
> example.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Mike Beckerle | Principal Engineer
> >>>>>>>>
> >>>>>>>> mbeckerle@owlcyberdefense.com <mailto:bhummel@owlcyberdefense.com
> >
> >>>>>>>>
> >>>>>>>> P +1-781-330-0412
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

I think the concern from ASF, and why they have this IP clearance
process, is that the copyright ownership of these files is not clear. It
wasn't done in a fork of a Daffodil repo, and there are contributions
from multiple developers, with little public oversight from Daffodil. I
think from the ASF's perspective, this code did not follow the ASF
process, and is assumed to be owned by the contributors or their
companies. There is no license/copyright in the code specifying
otherwise, so I think ASF must assume the worst, and require a software
grant.

Also note that even prototype code has a copyright owner and a license.
Copying it into a PR doesn't change that. If you were to throw away this
code and start from scratch following the ASF process, then it wouldn't
be a problem. But if the plan is to copy prototype code not owned by ASF
into a PR, then there are ownership concerns.

If all this work was done in a fork of the apache/daffodil-vscode repo
from a single contributor, then I think maybe the assumption from ASF is
the code was intended to be part of the main repo and implicitly granted
to ASF via the PR process.


On 9/9/21 4:05 PM, John Wass wrote:
> Yeah I was thinking of the example repo as a prototype, just as if I was
> working on a feature in my fork of Daffodil.  The main project doesn't own
> the feature until it crosses the PR threshold, and once it does cross over
> the state of my fork is of no concern to it.
> 
> 
> 
> On Thu, Sep 9, 2021 at 3:54 PM Steve Lawrence <sl...@apache.org> wrote:
> 
>> The concern is that this code was developed outside of Apache and so
>> didn't follow standard Apache process. From the IP clearance page:
>>
>> https://incubator.apache.org/ip-clearance/
>>
>>> Any code that was developed outside of the ASF SVN repository and
>>> our public mailing lists must be processed like this, even if the
>>> external developer is already an ASF committer.
>>
>> I suppose that submitting it as a PR does follow some of that process,
>> but there is maybe less assurance of ownership. Because it was not
>> developed in an ASF repository, that code is presumed to be owned by
>> you, multiple developers, or a company, and so that ownership must be
>> granted to ASF via the IP clearance process, with appropriate software
>> grant, CLA's, etc. (At least, that's my admittedly limited understanding
>> of the process).
>>
>> - Steve
>>
>>
>> On 9/9/21 3:34 PM, John Wass wrote:
>>> Couldn't we (the vscode contributors) submit a series of PRs against the
>>> new repo to move the code, and just archive the example repo as-is?
>>>
>>> I noted some thoughts on that a while back
>>> https://github.com/jw3/example-daffodil-vscode/issues/77
>>>
>>>
>>>
>>> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
>> mbeckerle@owlcyberdefense.com>
>>> wrote:
>>>
>>>> I know of one file in the repo which will have to be removed which is
>> the
>>>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>>>
>>>> The copyright and provisions of that are not compatible with Apache
>>>> licensing.
>>>>
>>>> We can find a DFDL schema that we created that has Apache license to use
>>>> instead.
>>>>
>>>> For the other files under src, server, and build, can we generate a list
>>>> of files identifying which are:
>>>>
>>>> (a) original MIT-licensed, unmodified
>>>> (b) new - can be ASL
>>>> (c) blended - started from MIT-licensed source, modified with
>>>> daffodil-vscode-specific changes.
>>>>
>>>> It is these blended files that are the problematic ones.
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Steve Lawrence <sl...@apache.org>
>>>> Sent: Thursday, September 9, 2021 1:38 PM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>> contribution - some git questions
>>>>
>>>> Correct. For more information about Apache license compatibility:
>>>>
>>>>   https://www.apache.org/legal/resolved.html
>>>>
>>>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>>>> generally only in its binary form. So these top-level dependencies look
>>>> okay, assuming their transitive dependencies are also okay.
>>>>
>>>> We'll also need to verify the licenses of all code in the repo.
>>>> Hopefully little of that is original microsoft MIT and can be granted to
>>>> ASF and relicensed.
>>>>
>>>>
>>>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>>>> The requirement, is that the entire dependency tree (transitively)
>>>> cannot depend on any software that has an Apache-incompatible (aka
>>>> restrictive) license.
>>>>>
>>>>> So we need the transitive closure of all dependencies.
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Adam Rosien <ad...@rosien.net>
>>>>> Sent: Thursday, September 9, 2021 12:44 PM
>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>>> Subject: Re: daffodil-vscode - how to package and identify the
>>>> contribution - some git questions
>>>>>
>>>>> (I don't understand the requirements of licencing + transitive
>>>>> dependencies, so I'm giving some surface level license info)
>>>>>
>>>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>>>> http://logback.qos.ch/license.html
>>>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>>>> 1.0
>>>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>>>
>>>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>>>>>
>>>>>> I can relay the list of dependencies and their licenses.
>>>>>>
>>>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I personally don't care too much about having the existing git
>> history
>>>>>>> once its part of ASF, especially if it makes thing any easier (as you
>>>>>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>>>>>> just do plan B--create a tarball of the current state (without the
>> git
>>>>>>> history), and the content of that tarball is what goes through the IP
>>>>>>> clearance process, and is the content of the inital commit when
>> adding
>>>>>>> to the apache/daffodil-vscode repo.
>>>>>>>
>>>>>>> Note that I think the incubator will still want access to the
>> existing
>>>>>>> repo so they can view the full git history. Understanding where
>>>>>>> everything came from and verifying the provenance is important to
>>>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>>>> maybe what is officially voted on, they will want access to the repo.
>>>>>>>
>>>>>>> That said, I don't think we are going to get CLA's for any Microsoft
>>>>>>> contribute code. So either all Microsoft contributed code will need
>> to
>>>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>>>>>> grant something to ASF where the original codebase stays MIT and
>> isn't
>>>>>>> part of that grant.
>>>>>>>
>>>>>>> I think understanding how much code still exists that is
>> Microsoft/MIT
>>>>>>> is going to be important to getting this through the IP clearance
>>>> process.
>>>>>>>
>>>>>>> So I'm curious how much of that original Microsoft code still
>> exists? I
>>>>>>> assume since it was just example code it has mostly been replaced? If
>>>>>>> that's the case, we could potentially say Microsoft has no ownership
>> of
>>>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>>>
>>>>>>> We should also have a good understanding of the dependencies. If any
>> of
>>>>>>> them are not compatible with ALv2, then going through this process
>>>> isn't
>>>>>>> even worth it until they are replaced. Do you have a list of the
>>>>>>> dependencies?
>>>>>>>
>>>>>>>
>>>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>>>> So the daffodil-vscode code-base wants to be granted to become part
>> of
>>>>>>> the
>>>>>>>> Daffodil project.
>>>>>>>>
>>>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>>>
>>>>>>>> The normal way this is identified is by creating a tarball of the
>>>>>>> source files
>>>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>>>
>>>>>>>> However, this code base is perhaps different from usual.
>>>>>>>>
>>>>>>>> It started by creating a detached fork of the vscode debugger
>> example
>>>>>>> code base.
>>>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>>>
>>>>>>>> The files are then edited. There are around 100 commits on top of
>> the
>>>>>>> base that
>>>>>>>> came from the vscode debugger repository.
>>>>>>>>
>>>>>>>> So the contribution is that set of 100 commits - the
>>>>>>> patches/change-sets they
>>>>>>>> represent.
>>>>>>>>
>>>>>>>> These commits often edit the original files of the vscode debugger
>>>>>>> example to
>>>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>>>> material is
>>>>>>>> in several cases intermingled in the lines of the existing files.
>>>>>>> That's ok I
>>>>>>>> think so long as the modified file had MIT license.
>>>>>>>>
>>>>>>>> There's some value in preserving the 100 commits by our
>> contributors,
>>>>>>> not
>>>>>>>> squashing it down to one commit, though if it's really not sensible
>> to
>>>>>>> proceed
>>>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>>>
>>>>>>>> Furthermore, the vscode debugger example repo itself had many
>> commits
>>>>>>> in it. The
>>>>>>>> current daffodil-vscode repo preserves all these commits as well. I
>>>>>>> don't see
>>>>>>>> value in preserving these commits, and would rather they were
>> squashed
>>>>>>> into a
>>>>>>>> single "starting point" commit, with a dependencies file specifying
>>>> the
>>>>>>> githash
>>>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>>>
>>>>>>>> So as a starting suggestion (subject to discussion of other
>>>>>>> alternatives) is this:
>>>>>>>>
>>>>>>>> Plan A:
>>>>>>>>
>>>>>>>>  1. squash all commits up to and including the last Microsoft
>> commit,
>>>>>>> together
>>>>>>>>     into one.
>>>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>>>      1. I'm a bit worried about this rebase. There are merge
>> commits,
>>>>>>> etc. in
>>>>>>>>         the history. I'm not sure this will just all rebase while
>>>>>>> preserving all
>>>>>>>>         the commits, but maybe it will "just work"
>>>>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>>>>>> make up the
>>>>>>>>     "contribution".
>>>>>>>>      1. I don't know if this is even feasible for this many commits.
>>>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>>>  5. compute an md5 of this patch set.
>>>>>>>>
>>>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>>>> software".
>>>>>>>>
>>>>>>>> The problem with this idea is that there's no obvious way to review
>> a
>>>>>>> patch set,
>>>>>>>> shy of applying it.
>>>>>>>>
>>>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>>>
>>>>>>>> Plan B:
>>>>>>>>
>>>>>>>>     3. push the main branch to a new empty git repository
>>>>>>>>          The point of this is to remove all historic stuff from the
>>>>>>> repository,
>>>>>>>>     i.e., have a minimal git repo that contains only the
>> contribution
>>>>>>> and the
>>>>>>>>     single other commit it must be based on.
>>>>>>>>
>>>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>>>
>>>>>>>>     5. document that the contribution is from githash X (after the
>>>>>>> first commit)
>>>>>>>>     to githash Y (the final commit) of this repository
>>>>>>>>
>>>>>>>>
>>>>>>>> This has the advantage that the contribution is a self-contained
>>>>>>> review-able thing.
>>>>>>>>
>>>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>>>> know
>>>>>>> of are:
>>>>>>>>
>>>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>>>  2. a sensible way one can review the contents of this contribution
>>>> file
>>>>>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Mike Beckerle | Principal Engineer
>>>>>>>>
>>>>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>>>>>
>>>>>>>> P +1-781-330-0412
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by John Wass <jw...@gmail.com>.

Yeah I was thinking of the example repo as a prototype, just as if I was
working on a feature in my fork of Daffodil.  The main project doesn't own
the feature until it crosses the PR threshold, and once it does cross over
the state of my fork is of no concern to it.



On Thu, Sep 9, 2021 at 3:54 PM Steve Lawrence <sl...@apache.org> wrote:

> The concern is that this code was developed outside of Apache and so
> didn't follow standard Apache process. From the IP clearance page:
>
> https://incubator.apache.org/ip-clearance/
>
> > Any code that was developed outside of the ASF SVN repository and
> > our public mailing lists must be processed like this, even if the
> > external developer is already an ASF committer.
>
> I suppose that submitting it as a PR does follow some of that process,
> but there is maybe less assurance of ownership. Because it was not
> developed in an ASF repository, that code is presumed to be owned by
> you, multiple developers, or a company, and so that ownership must be
> granted to ASF via the IP clearance process, with appropriate software
> grant, CLA's, etc. (At least, that's my admittedly limited understanding
> of the process).
>
> - Steve
>
>
> On 9/9/21 3:34 PM, John Wass wrote:
> > Couldn't we (the vscode contributors) submit a series of PRs against the
> > new repo to move the code, and just archive the example repo as-is?
> >
> > I noted some thoughts on that a while back
> > https://github.com/jw3/example-daffodil-vscode/issues/77
> >
> >
> >
> > On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <
> mbeckerle@owlcyberdefense.com>
> > wrote:
> >
> >> I know of one file in the repo which will have to be removed which is
> the
> >> jpeg.dfdl.xsd file, which is there just as an example workspace.
> >>
> >> The copyright and provisions of that are not compatible with Apache
> >> licensing.
> >>
> >> We can find a DFDL schema that we created that has Apache license to use
> >> instead.
> >>
> >> For the other files under src, server, and build, can we generate a list
> >> of files identifying which are:
> >>
> >> (a) original MIT-licensed, unmodified
> >> (b) new - can be ASL
> >> (c) blended - started from MIT-licensed source, modified with
> >> daffodil-vscode-specific changes.
> >>
> >> It is these blended files that are the problematic ones.
> >>
> >>
> >>
> >> ________________________________
> >> From: Steve Lawrence <sl...@apache.org>
> >> Sent: Thursday, September 9, 2021 1:38 PM
> >> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >> Subject: Re: daffodil-vscode - how to package and identify the
> >> contribution - some git questions
> >>
> >> Correct. For more information about Apache license compatibility:
> >>
> >>   https://www.apache.org/legal/resolved.html
> >>
> >> MIT is Category A and is fine. EPL is Category B and is also okay, but
> >> generally only in its binary form. So these top-level dependencies look
> >> okay, assuming their transitive dependencies are also okay.
> >>
> >> We'll also need to verify the licenses of all code in the repo.
> >> Hopefully little of that is original microsoft MIT and can be granted to
> >> ASF and relicensed.
> >>
> >>
> >> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> >>> The requirement, is that the entire dependency tree (transitively)
> >> cannot depend on any software that has an Apache-incompatible (aka
> >> restrictive) license.
> >>>
> >>> So we need the transitive closure of all dependencies.
> >>>
> >>>
> >>> ________________________________
> >>> From: Adam Rosien <ad...@rosien.net>
> >>> Sent: Thursday, September 9, 2021 12:44 PM
> >>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>> Subject: Re: daffodil-vscode - how to package and identify the
> >> contribution - some git questions
> >>>
> >>> (I don't understand the requirements of licencing + transitive
> >>> dependencies, so I'm giving some surface level license info)
> >>>
> >>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> >>> http://logback.qos.ch/license.html
> >>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> >> 1.0
> >>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> >>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> >>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >>>
> >>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >>>
> >>>> I can relay the list of dependencies and their licenses.
> >>>>
> >>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> I personally don't care too much about having the existing git
> history
> >>>>> once its part of ASF, especially if it makes thing any easier (as you
> >>>>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>>>> just do plan B--create a tarball of the current state (without the
> git
> >>>>> history), and the content of that tarball is what goes through the IP
> >>>>> clearance process, and is the content of the inital commit when
> adding
> >>>>> to the apache/daffodil-vscode repo.
> >>>>>
> >>>>> Note that I think the incubator will still want access to the
> existing
> >>>>> repo so they can view the full git history. Understanding where
> >>>>> everything came from and verifying the provenance is important to
> >>>>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>>>> maybe what is officially voted on, they will want access to the repo.
> >>>>>
> >>>>> That said, I don't think we are going to get CLA's for any Microsoft
> >>>>> contribute code. So either all Microsoft contributed code will need
> to
> >>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
> >>>>> grant something to ASF where the original codebase stays MIT and
> isn't
> >>>>> part of that grant.
> >>>>>
> >>>>> I think understanding how much code still exists that is
> Microsoft/MIT
> >>>>> is going to be important to getting this through the IP clearance
> >> process.
> >>>>>
> >>>>> So I'm curious how much of that original Microsoft code still
> exists? I
> >>>>> assume since it was just example code it has mostly been replaced? If
> >>>>> that's the case, we could potentially say Microsoft has no ownership
> of
> >>>>> this code, and so their CLA and MIT license aren't necessary?
> >>>>>
> >>>>> We should also have a good understanding of the dependencies. If any
> of
> >>>>> them are not compatible with ALv2, then going through this process
> >> isn't
> >>>>> even worth it until they are replaced. Do you have a list of the
> >>>>> dependencies?
> >>>>>
> >>>>>
> >>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>>>> So the daffodil-vscode code-base wants to be granted to become part
> of
> >>>>> the
> >>>>>> Daffodil project.
> >>>>>>
> >>>>>> One question arises which is "what is the contribution?" exactly.
> >>>>>>
> >>>>>> The normal way this is identified is by creating a tarball of the
> >>>>> source files
> >>>>>> and specifying an sha or md5 hash of that file.
> >>>>>>
> >>>>>> However, this code base is perhaps different from usual.
> >>>>>>
> >>>>>> It started by creating a detached fork of the vscode debugger
> example
> >>>>> code base.
> >>>>>> This is MIT-Licensed which is a compatible license.
> >>>>>>
> >>>>>> The files are then edited. There are around 100 commits on top of
> the
> >>>>> base that
> >>>>>> came from the vscode debugger repository.
> >>>>>>
> >>>>>> So the contribution is that set of 100 commits - the
> >>>>> patches/change-sets they
> >>>>>> represent.
> >>>>>>
> >>>>>> These commits often edit the original files of the vscode debugger
> >>>>> example to
> >>>>>> add the daffodil-specific functionality. That is, the contribution
> >>>>> material is
> >>>>>> in several cases intermingled in the lines of the existing files.
> >>>>> That's ok I
> >>>>>> think so long as the modified file had MIT license.
> >>>>>>
> >>>>>> There's some value in preserving the 100 commits by our
> contributors,
> >>>>> not
> >>>>>> squashing it down to one commit, though if it's really not sensible
> to
> >>>>> proceed
> >>>>>> otherwise, we can choose to squash it down to one commit.
> >>>>>>
> >>>>>> Furthermore, the vscode debugger example repo itself had many
> commits
> >>>>> in it. The
> >>>>>> current daffodil-vscode repo preserves all these commits as well. I
> >>>>> don't see
> >>>>>> value in preserving these commits, and would rather they were
> squashed
> >>>>> into a
> >>>>>> single "starting point" commit, with a dependencies file specifying
> >> the
> >>>>> githash
> >>>>>> where we forked from, just so we can refer back if necessary.
> >>>>>>
> >>>>>> So as a starting suggestion (subject to discussion of other
> >>>>> alternatives) is this:
> >>>>>>
> >>>>>> Plan A:
> >>>>>>
> >>>>>>  1. squash all commits up to and including the last Microsoft
> commit,
> >>>>> together
> >>>>>>     into one.
> >>>>>>  2. rebase the remaining commits on top of that.
> >>>>>>      1. I'm a bit worried about this rebase. There are merge
> commits,
> >>>>> etc. in
> >>>>>>         the history. I'm not sure this will just all rebase while
> >>>>> preserving all
> >>>>>>         the commits, but maybe it will "just work"
> >>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
> >>>>> make up the
> >>>>>>     "contribution".
> >>>>>>      1. I don't know if this is even feasible for this many commits.
> >>>>>>  4. create a tar/zip of this aggregate patch set.
> >>>>>>  5. compute an md5 of this patch set.
> >>>>>>
> >>>>>> The patch set tar/zip file and its md5 hash are "the granted
> >> software".
> >>>>>>
> >>>>>> The problem with this idea is that there's no obvious way to review
> a
> >>>>> patch set,
> >>>>>> shy of applying it.
> >>>>>>
> >>>>>> A better way may be to change steps 3 - 5 above to
> >>>>>>
> >>>>>> Plan B:
> >>>>>>
> >>>>>>     3. push the main branch to a new empty git repository
> >>>>>>          The point of this is to remove all historic stuff from the
> >>>>> repository,
> >>>>>>     i.e., have a minimal git repo that contains only the
> contribution
> >>>>> and the
> >>>>>>     single other commit it must be based on.
> >>>>>>
> >>>>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>>>
> >>>>>>     5. document that the contribution is from githash X (after the
> >>>>> first commit)
> >>>>>>     to githash Y (the final commit) of this repository
> >>>>>>
> >>>>>>
> >>>>>> This has the advantage that the contribution is a self-contained
> >>>>> review-able thing.
> >>>>>>
> >>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> >> know
> >>>>> of are:
> >>>>>>
> >>>>>>  1. a single file containing the contribution, and its md5 hash
> >>>>>>  2. a sensible way one can review the contents of this contribution
> >> file
> >>>>>>  3. preserve history of derivation from the vscode debugger example.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Mike Beckerle | Principal Engineer
> >>>>>>
> >>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >>>>>>
> >>>>>> P +1-781-330-0412
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>
> >>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

The concern is that this code was developed outside of Apache and so
didn't follow standard Apache process. From the IP clearance page:

https://incubator.apache.org/ip-clearance/

> Any code that was developed outside of the ASF SVN repository and
> our public mailing lists must be processed like this, even if the
> external developer is already an ASF committer.

I suppose that submitting it as a PR does follow some of that process,
but there is maybe less assurance of ownership. Because it was not
developed in an ASF repository, that code is presumed to be owned by
you, multiple developers, or a company, and so that ownership must be
granted to ASF via the IP clearance process, with appropriate software
grant, CLA's, etc. (At least, that's my admittedly limited understanding
of the process).

- Steve


On 9/9/21 3:34 PM, John Wass wrote:
> Couldn't we (the vscode contributors) submit a series of PRs against the
> new repo to move the code, and just archive the example repo as-is?
> 
> I noted some thoughts on that a while back
> https://github.com/jw3/example-daffodil-vscode/issues/77
> 
> 
> 
> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
> wrote:
> 
>> I know of one file in the repo which will have to be removed which is the
>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>
>> The copyright and provisions of that are not compatible with Apache
>> licensing.
>>
>> We can find a DFDL schema that we created that has Apache license to use
>> instead.
>>
>> For the other files under src, server, and build, can we generate a list
>> of files identifying which are:
>>
>> (a) original MIT-licensed, unmodified
>> (b) new - can be ASL
>> (c) blended - started from MIT-licensed source, modified with
>> daffodil-vscode-specific changes.
>>
>> It is these blended files that are the problematic ones.
>>
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Thursday, September 9, 2021 1:38 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> Correct. For more information about Apache license compatibility:
>>
>>   https://www.apache.org/legal/resolved.html
>>
>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>> generally only in its binary form. So these top-level dependencies look
>> okay, assuming their transitive dependencies are also okay.
>>
>> We'll also need to verify the licenses of all code in the repo.
>> Hopefully little of that is original microsoft MIT and can be granted to
>> ASF and relicensed.
>>
>>
>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>> The requirement, is that the entire dependency tree (transitively)
>> cannot depend on any software that has an Apache-incompatible (aka
>> restrictive) license.
>>>
>>> So we need the transitive closure of all dependencies.
>>>
>>>
>>> ________________________________
>>> From: Adam Rosien <ad...@rosien.net>
>>> Sent: Thursday, September 9, 2021 12:44 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>>
>>> (I don't understand the requirements of licencing + transitive
>>> dependencies, so I'm giving some surface level license info)
>>>
>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>> http://logback.qos.ch/license.html
>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>> 1.0
>>> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
>>> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
>>> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>>>
>>> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>>>
>>>> I can relay the list of dependencies and their licenses.
>>>>
>>>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>>>> wrote:
>>>>
>>>>> I personally don't care too much about having the existing git history
>>>>> once its part of ASF, especially if it makes thing any easier (as you
>>>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>>>> just do plan B--create a tarball of the current state (without the git
>>>>> history), and the content of that tarball is what goes through the IP
>>>>> clearance process, and is the content of the inital commit when adding
>>>>> to the apache/daffodil-vscode repo.
>>>>>
>>>>> Note that I think the incubator will still want access to the existing
>>>>> repo so they can view the full git history. Understanding where
>>>>> everything came from and verifying the provenance is important to
>>>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>>>> maybe what is officially voted on, they will want access to the repo.
>>>>>
>>>>> That said, I don't think we are going to get CLA's for any Microsoft
>>>>> contribute code. So either all Microsoft contributed code will need to
>>>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>>>> grant something to ASF where the original codebase stays MIT and isn't
>>>>> part of that grant.
>>>>>
>>>>> I think understanding how much code still exists that is Microsoft/MIT
>>>>> is going to be important to getting this through the IP clearance
>> process.
>>>>>
>>>>> So I'm curious how much of that original Microsoft code still exists? I
>>>>> assume since it was just example code it has mostly been replaced? If
>>>>> that's the case, we could potentially say Microsoft has no ownership of
>>>>> this code, and so their CLA and MIT license aren't necessary?
>>>>>
>>>>> We should also have a good understanding of the dependencies. If any of
>>>>> them are not compatible with ALv2, then going through this process
>> isn't
>>>>> even worth it until they are replaced. Do you have a list of the
>>>>> dependencies?
>>>>>
>>>>>
>>>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>>>> So the daffodil-vscode code-base wants to be granted to become part of
>>>>> the
>>>>>> Daffodil project.
>>>>>>
>>>>>> One question arises which is "what is the contribution?" exactly.
>>>>>>
>>>>>> The normal way this is identified is by creating a tarball of the
>>>>> source files
>>>>>> and specifying an sha or md5 hash of that file.
>>>>>>
>>>>>> However, this code base is perhaps different from usual.
>>>>>>
>>>>>> It started by creating a detached fork of the vscode debugger example
>>>>> code base.
>>>>>> This is MIT-Licensed which is a compatible license.
>>>>>>
>>>>>> The files are then edited. There are around 100 commits on top of the
>>>>> base that
>>>>>> came from the vscode debugger repository.
>>>>>>
>>>>>> So the contribution is that set of 100 commits - the
>>>>> patches/change-sets they
>>>>>> represent.
>>>>>>
>>>>>> These commits often edit the original files of the vscode debugger
>>>>> example to
>>>>>> add the daffodil-specific functionality. That is, the contribution
>>>>> material is
>>>>>> in several cases intermingled in the lines of the existing files.
>>>>> That's ok I
>>>>>> think so long as the modified file had MIT license.
>>>>>>
>>>>>> There's some value in preserving the 100 commits by our contributors,
>>>>> not
>>>>>> squashing it down to one commit, though if it's really not sensible to
>>>>> proceed
>>>>>> otherwise, we can choose to squash it down to one commit.
>>>>>>
>>>>>> Furthermore, the vscode debugger example repo itself had many commits
>>>>> in it. The
>>>>>> current daffodil-vscode repo preserves all these commits as well. I
>>>>> don't see
>>>>>> value in preserving these commits, and would rather they were squashed
>>>>> into a
>>>>>> single "starting point" commit, with a dependencies file specifying
>> the
>>>>> githash
>>>>>> where we forked from, just so we can refer back if necessary.
>>>>>>
>>>>>> So as a starting suggestion (subject to discussion of other
>>>>> alternatives) is this:
>>>>>>
>>>>>> Plan A:
>>>>>>
>>>>>>  1. squash all commits up to and including the last Microsoft commit,
>>>>> together
>>>>>>     into one.
>>>>>>  2. rebase the remaining commits on top of that.
>>>>>>      1. I'm a bit worried about this rebase. There are merge commits,
>>>>> etc. in
>>>>>>         the history. I'm not sure this will just all rebase while
>>>>> preserving all
>>>>>>         the commits, but maybe it will "just work"
>>>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>>>> make up the
>>>>>>     "contribution".
>>>>>>      1. I don't know if this is even feasible for this many commits.
>>>>>>  4. create a tar/zip of this aggregate patch set.
>>>>>>  5. compute an md5 of this patch set.
>>>>>>
>>>>>> The patch set tar/zip file and its md5 hash are "the granted
>> software".
>>>>>>
>>>>>> The problem with this idea is that there's no obvious way to review a
>>>>> patch set,
>>>>>> shy of applying it.
>>>>>>
>>>>>> A better way may be to change steps 3 - 5 above to
>>>>>>
>>>>>> Plan B:
>>>>>>
>>>>>>     3. push the main branch to a new empty git repository
>>>>>>          The point of this is to remove all historic stuff from the
>>>>> repository,
>>>>>>     i.e., have a minimal git repo that contains only the contribution
>>>>> and the
>>>>>>     single other commit it must be based on.
>>>>>>
>>>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>>>
>>>>>>     5. document that the contribution is from githash X (after the
>>>>> first commit)
>>>>>>     to githash Y (the final commit) of this repository
>>>>>>
>>>>>>
>>>>>> This has the advantage that the contribution is a self-contained
>>>>> review-able thing.
>>>>>>
>>>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
>> know
>>>>> of are:
>>>>>>
>>>>>>  1. a single file containing the contribution, and its md5 hash
>>>>>>  2. a sensible way one can review the contents of this contribution
>> file
>>>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mike Beckerle | Principal Engineer
>>>>>>
>>>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>>>
>>>>>> P +1-781-330-0412
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by John Wass <jw...@gmail.com>.

Couldn't we (the vscode contributors) submit a series of PRs against the
new repo to move the code, and just archive the example repo as-is?

I noted some thoughts on that a while back
https://github.com/jw3/example-daffodil-vscode/issues/77



On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> I know of one file in the repo which will have to be removed which is the
> jpeg.dfdl.xsd file, which is there just as an example workspace.
>
> The copyright and provisions of that are not compatible with Apache
> licensing.
>
> We can find a DFDL schema that we created that has Apache license to use
> instead.
>
> For the other files under src, server, and build, can we generate a list
> of files identifying which are:
>
> (a) original MIT-licensed, unmodified
> (b) new - can be ASL
> (c) blended - started from MIT-licensed source, modified with
> daffodil-vscode-specific changes.
>
> It is these blended files that are the problematic ones.
>
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, September 9, 2021 1:38 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Correct. For more information about Apache license compatibility:
>
>   https://www.apache.org/legal/resolved.html
>
> MIT is Category A and is fine. EPL is Category B and is also okay, but
> generally only in its binary form. So these top-level dependencies look
> okay, assuming their transitive dependencies are also okay.
>
> We'll also need to verify the licenses of all code in the repo.
> Hopefully little of that is original microsoft MIT and can be granted to
> ASF and relicensed.
>
>
> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> > The requirement, is that the entire dependency tree (transitively)
> cannot depend on any software that has an Apache-incompatible (aka
> restrictive) license.
> >
> > So we need the transitive closure of all dependencies.
> >
> >
> > ________________________________
> > From: Adam Rosien <ad...@rosien.net>
> > Sent: Thursday, September 9, 2021 12:44 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
> >
> > (I don't understand the requirements of licencing + transitive
> > dependencies, so I'm giving some surface level license info)
> >
> > "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> > http://logback.qos.ch/license.html
> > "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> 1.0
> > "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> > "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> > "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >
> > On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> >
> >> I can relay the list of dependencies and their licenses.
> >>
> >> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> >> wrote:
> >>
> >>> I personally don't care too much about having the existing git history
> >>> once its part of ASF, especially if it makes thing any easier (as you
> >>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>> just do plan B--create a tarball of the current state (without the git
> >>> history), and the content of that tarball is what goes through the IP
> >>> clearance process, and is the content of the inital commit when adding
> >>> to the apache/daffodil-vscode repo.
> >>>
> >>> Note that I think the incubator will still want access to the existing
> >>> repo so they can view the full git history. Understanding where
> >>> everything came from and verifying the provenance is important to
> >>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>> maybe what is officially voted on, they will want access to the repo.
> >>>
> >>> That said, I don't think we are going to get CLA's for any Microsoft
> >>> contribute code. So either all Microsoft contributed code will need to
> >>> be kept MIT, or removed from the codebase. And if feels a bit odd to
> >>> grant something to ASF where the original codebase stays MIT and isn't
> >>> part of that grant.
> >>>
> >>> I think understanding how much code still exists that is Microsoft/MIT
> >>> is going to be important to getting this through the IP clearance
> process.
> >>>
> >>> So I'm curious how much of that original Microsoft code still exists? I
> >>> assume since it was just example code it has mostly been replaced? If
> >>> that's the case, we could potentially say Microsoft has no ownership of
> >>> this code, and so their CLA and MIT license aren't necessary?
> >>>
> >>> We should also have a good understanding of the dependencies. If any of
> >>> them are not compatible with ALv2, then going through this process
> isn't
> >>> even worth it until they are replaced. Do you have a list of the
> >>> dependencies?
> >>>
> >>>
> >>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> >>>> So the daffodil-vscode code-base wants to be granted to become part of
> >>> the
> >>>> Daffodil project.
> >>>>
> >>>> One question arises which is "what is the contribution?" exactly.
> >>>>
> >>>> The normal way this is identified is by creating a tarball of the
> >>> source files
> >>>> and specifying an sha or md5 hash of that file.
> >>>>
> >>>> However, this code base is perhaps different from usual.
> >>>>
> >>>> It started by creating a detached fork of the vscode debugger example
> >>> code base.
> >>>> This is MIT-Licensed which is a compatible license.
> >>>>
> >>>> The files are then edited. There are around 100 commits on top of the
> >>> base that
> >>>> came from the vscode debugger repository.
> >>>>
> >>>> So the contribution is that set of 100 commits - the
> >>> patches/change-sets they
> >>>> represent.
> >>>>
> >>>> These commits often edit the original files of the vscode debugger
> >>> example to
> >>>> add the daffodil-specific functionality. That is, the contribution
> >>> material is
> >>>> in several cases intermingled in the lines of the existing files.
> >>> That's ok I
> >>>> think so long as the modified file had MIT license.
> >>>>
> >>>> There's some value in preserving the 100 commits by our contributors,
> >>> not
> >>>> squashing it down to one commit, though if it's really not sensible to
> >>> proceed
> >>>> otherwise, we can choose to squash it down to one commit.
> >>>>
> >>>> Furthermore, the vscode debugger example repo itself had many commits
> >>> in it. The
> >>>> current daffodil-vscode repo preserves all these commits as well. I
> >>> don't see
> >>>> value in preserving these commits, and would rather they were squashed
> >>> into a
> >>>> single "starting point" commit, with a dependencies file specifying
> the
> >>> githash
> >>>> where we forked from, just so we can refer back if necessary.
> >>>>
> >>>> So as a starting suggestion (subject to discussion of other
> >>> alternatives) is this:
> >>>>
> >>>> Plan A:
> >>>>
> >>>>  1. squash all commits up to and including the last Microsoft commit,
> >>> together
> >>>>     into one.
> >>>>  2. rebase the remaining commits on top of that.
> >>>>      1. I'm a bit worried about this rebase. There are merge commits,
> >>> etc. in
> >>>>         the history. I'm not sure this will just all rebase while
> >>> preserving all
> >>>>         the commits, but maybe it will "just work"
> >>>>  3. create a "patch set" corresponding to the 100 or so commits that
> >>> make up the
> >>>>     "contribution".
> >>>>      1. I don't know if this is even feasible for this many commits.
> >>>>  4. create a tar/zip of this aggregate patch set.
> >>>>  5. compute an md5 of this patch set.
> >>>>
> >>>> The patch set tar/zip file and its md5 hash are "the granted
> software".
> >>>>
> >>>> The problem with this idea is that there's no obvious way to review a
> >>> patch set,
> >>>> shy of applying it.
> >>>>
> >>>> A better way may be to change steps 3 - 5 above to
> >>>>
> >>>> Plan B:
> >>>>
> >>>>     3. push the main branch to a new empty git repository
> >>>>          The point of this is to remove all historic stuff from the
> >>> repository,
> >>>>     i.e., have a minimal git repo that contains only the contribution
> >>> and the
> >>>>     single other commit it must be based on.
> >>>>
> >>>>     4. create a tarball of this git repository, and md5 hash of it
> >>>>
> >>>>     5. document that the contribution is from githash X (after the
> >>> first commit)
> >>>>     to githash Y (the final commit) of this repository
> >>>>
> >>>>
> >>>> This has the advantage that the contribution is a self-contained
> >>> review-able thing.
> >>>>
> >>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I
> know
> >>> of are:
> >>>>
> >>>>  1. a single file containing the contribution, and its md5 hash
> >>>>  2. a sensible way one can review the contents of this contribution
> file
> >>>>  3. preserve history of derivation from the vscode debugger example.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Mike Beckerle | Principal Engineer
> >>>>
> >>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >>>>
> >>>> P +1-781-330-0412
> >>>>
> >>>
> >>>
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

I know of one file in the repo which will have to be removed which is the jpeg.dfdl.xsd file, which is there just as an example workspace.

The copyright and provisions of that are not compatible with Apache licensing.

We can find a DFDL schema that we created that has Apache license to use instead.

For the other files under src, server, and build, can we generate a list of files identifying which are:

(a) original MIT-licensed, unmodified
(b) new - can be ASL
(c) blended - started from MIT-licensed source, modified with daffodil-vscode-specific changes.

It is these blended files that are the problematic ones.



________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, September 9, 2021 1:38 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

Correct. For more information about Apache license compatibility:

  https://www.apache.org/legal/resolved.html

MIT is Category A and is fine. EPL is Category B and is also okay, but
generally only in its binary form. So these top-level dependencies look
okay, assuming their transitive dependencies are also okay.

We'll also need to verify the licenses of all code in the repo.
Hopefully little of that is original microsoft MIT and can be granted to
ASF and relicensed.


On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> The requirement, is that the entire dependency tree (transitively) cannot depend on any software that has an Apache-incompatible (aka restrictive) license.
>
> So we need the transitive closure of all dependencies.
>
>
> ________________________________
> From: Adam Rosien <ad...@rosien.net>
> Sent: Thursday, September 9, 2021 12:44 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions
>
> (I don't understand the requirements of licencing + transitive
> dependencies, so I'm giving some surface level license info)
>
> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> http://logback.qos.ch/license.html
> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL 1.0
> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>
> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
>
>> I can relay the list of dependencies and their licenses.
>>
>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>> wrote:
>>
>>> I personally don't care too much about having the existing git history
>>> once its part of ASF, especially if it makes thing any easier (as you
>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>> just do plan B--create a tarball of the current state (without the git
>>> history), and the content of that tarball is what goes through the IP
>>> clearance process, and is the content of the inital commit when adding
>>> to the apache/daffodil-vscode repo.
>>>
>>> Note that I think the incubator will still want access to the existing
>>> repo so they can view the full git history. Understanding where
>>> everything came from and verifying the provenance is important to
>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>> maybe what is officially voted on, they will want access to the repo.
>>>
>>> That said, I don't think we are going to get CLA's for any Microsoft
>>> contribute code. So either all Microsoft contributed code will need to
>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>> grant something to ASF where the original codebase stays MIT and isn't
>>> part of that grant.
>>>
>>> I think understanding how much code still exists that is Microsoft/MIT
>>> is going to be important to getting this through the IP clearance process.
>>>
>>> So I'm curious how much of that original Microsoft code still exists? I
>>> assume since it was just example code it has mostly been replaced? If
>>> that's the case, we could potentially say Microsoft has no ownership of
>>> this code, and so their CLA and MIT license aren't necessary?
>>>
>>> We should also have a good understanding of the dependencies. If any of
>>> them are not compatible with ALv2, then going through this process isn't
>>> even worth it until they are replaced. Do you have a list of the
>>> dependencies?
>>>
>>>
>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>> So the daffodil-vscode code-base wants to be granted to become part of
>>> the
>>>> Daffodil project.
>>>>
>>>> One question arises which is "what is the contribution?" exactly.
>>>>
>>>> The normal way this is identified is by creating a tarball of the
>>> source files
>>>> and specifying an sha or md5 hash of that file.
>>>>
>>>> However, this code base is perhaps different from usual.
>>>>
>>>> It started by creating a detached fork of the vscode debugger example
>>> code base.
>>>> This is MIT-Licensed which is a compatible license.
>>>>
>>>> The files are then edited. There are around 100 commits on top of the
>>> base that
>>>> came from the vscode debugger repository.
>>>>
>>>> So the contribution is that set of 100 commits - the
>>> patches/change-sets they
>>>> represent.
>>>>
>>>> These commits often edit the original files of the vscode debugger
>>> example to
>>>> add the daffodil-specific functionality. That is, the contribution
>>> material is
>>>> in several cases intermingled in the lines of the existing files.
>>> That's ok I
>>>> think so long as the modified file had MIT license.
>>>>
>>>> There's some value in preserving the 100 commits by our contributors,
>>> not
>>>> squashing it down to one commit, though if it's really not sensible to
>>> proceed
>>>> otherwise, we can choose to squash it down to one commit.
>>>>
>>>> Furthermore, the vscode debugger example repo itself had many commits
>>> in it. The
>>>> current daffodil-vscode repo preserves all these commits as well. I
>>> don't see
>>>> value in preserving these commits, and would rather they were squashed
>>> into a
>>>> single "starting point" commit, with a dependencies file specifying the
>>> githash
>>>> where we forked from, just so we can refer back if necessary.
>>>>
>>>> So as a starting suggestion (subject to discussion of other
>>> alternatives) is this:
>>>>
>>>> Plan A:
>>>>
>>>>  1. squash all commits up to and including the last Microsoft commit,
>>> together
>>>>     into one.
>>>>  2. rebase the remaining commits on top of that.
>>>>      1. I'm a bit worried about this rebase. There are merge commits,
>>> etc. in
>>>>         the history. I'm not sure this will just all rebase while
>>> preserving all
>>>>         the commits, but maybe it will "just work"
>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>> make up the
>>>>     "contribution".
>>>>      1. I don't know if this is even feasible for this many commits.
>>>>  4. create a tar/zip of this aggregate patch set.
>>>>  5. compute an md5 of this patch set.
>>>>
>>>> The patch set tar/zip file and its md5 hash are "the granted software".
>>>>
>>>> The problem with this idea is that there's no obvious way to review a
>>> patch set,
>>>> shy of applying it.
>>>>
>>>> A better way may be to change steps 3 - 5 above to
>>>>
>>>> Plan B:
>>>>
>>>>     3. push the main branch to a new empty git repository
>>>>          The point of this is to remove all historic stuff from the
>>> repository,
>>>>     i.e., have a minimal git repo that contains only the contribution
>>> and the
>>>>     single other commit it must be based on.
>>>>
>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>
>>>>     5. document that the contribution is from githash X (after the
>>> first commit)
>>>>     to githash Y (the final commit) of this repository
>>>>
>>>>
>>>> This has the advantage that the contribution is a self-contained
>>> review-able thing.
>>>>
>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I know
>>> of are:
>>>>
>>>>  1. a single file containing the contribution, and its md5 hash
>>>>  2. a sensible way one can review the contents of this contribution file
>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mike Beckerle | Principal Engineer
>>>>
>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>
>>>> P +1-781-330-0412
>>>>
>>>
>>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

Correct. For more information about Apache license compatibility:

  https://www.apache.org/legal/resolved.html

MIT is Category A and is fine. EPL is Category B and is also okay, but
generally only in its binary form. So these top-level dependencies look
okay, assuming their transitive dependencies are also okay.

We'll also need to verify the licenses of all code in the repo.
Hopefully little of that is original microsoft MIT and can be granted to
ASF and relicensed.


On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> The requirement, is that the entire dependency tree (transitively) cannot depend on any software that has an Apache-incompatible (aka restrictive) license.
> 
> So we need the transitive closure of all dependencies.
> 
> 
> ________________________________
> From: Adam Rosien <ad...@rosien.net>
> Sent: Thursday, September 9, 2021 12:44 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions
> 
> (I don't understand the requirements of licencing + transitive
> dependencies, so I'm giving some surface level license info)
> 
> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> http://logback.qos.ch/license.html
> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL 1.0
> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> 
> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:
> 
>> I can relay the list of dependencies and their licenses.
>>
>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
>> wrote:
>>
>>> I personally don't care too much about having the existing git history
>>> once its part of ASF, especially if it makes thing any easier (as you
>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>> just do plan B--create a tarball of the current state (without the git
>>> history), and the content of that tarball is what goes through the IP
>>> clearance process, and is the content of the inital commit when adding
>>> to the apache/daffodil-vscode repo.
>>>
>>> Note that I think the incubator will still want access to the existing
>>> repo so they can view the full git history. Understanding where
>>> everything came from and verifying the provenance is important to
>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>> maybe what is officially voted on, they will want access to the repo.
>>>
>>> That said, I don't think we are going to get CLA's for any Microsoft
>>> contribute code. So either all Microsoft contributed code will need to
>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>> grant something to ASF where the original codebase stays MIT and isn't
>>> part of that grant.
>>>
>>> I think understanding how much code still exists that is Microsoft/MIT
>>> is going to be important to getting this through the IP clearance process.
>>>
>>> So I'm curious how much of that original Microsoft code still exists? I
>>> assume since it was just example code it has mostly been replaced? If
>>> that's the case, we could potentially say Microsoft has no ownership of
>>> this code, and so their CLA and MIT license aren't necessary?
>>>
>>> We should also have a good understanding of the dependencies. If any of
>>> them are not compatible with ALv2, then going through this process isn't
>>> even worth it until they are replaced. Do you have a list of the
>>> dependencies?
>>>
>>>
>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>> So the daffodil-vscode code-base wants to be granted to become part of
>>> the
>>>> Daffodil project.
>>>>
>>>> One question arises which is "what is the contribution?" exactly.
>>>>
>>>> The normal way this is identified is by creating a tarball of the
>>> source files
>>>> and specifying an sha or md5 hash of that file.
>>>>
>>>> However, this code base is perhaps different from usual.
>>>>
>>>> It started by creating a detached fork of the vscode debugger example
>>> code base.
>>>> This is MIT-Licensed which is a compatible license.
>>>>
>>>> The files are then edited. There are around 100 commits on top of the
>>> base that
>>>> came from the vscode debugger repository.
>>>>
>>>> So the contribution is that set of 100 commits - the
>>> patches/change-sets they
>>>> represent.
>>>>
>>>> These commits often edit the original files of the vscode debugger
>>> example to
>>>> add the daffodil-specific functionality. That is, the contribution
>>> material is
>>>> in several cases intermingled in the lines of the existing files.
>>> That's ok I
>>>> think so long as the modified file had MIT license.
>>>>
>>>> There's some value in preserving the 100 commits by our contributors,
>>> not
>>>> squashing it down to one commit, though if it's really not sensible to
>>> proceed
>>>> otherwise, we can choose to squash it down to one commit.
>>>>
>>>> Furthermore, the vscode debugger example repo itself had many commits
>>> in it. The
>>>> current daffodil-vscode repo preserves all these commits as well. I
>>> don't see
>>>> value in preserving these commits, and would rather they were squashed
>>> into a
>>>> single "starting point" commit, with a dependencies file specifying the
>>> githash
>>>> where we forked from, just so we can refer back if necessary.
>>>>
>>>> So as a starting suggestion (subject to discussion of other
>>> alternatives) is this:
>>>>
>>>> Plan A:
>>>>
>>>>  1. squash all commits up to and including the last Microsoft commit,
>>> together
>>>>     into one.
>>>>  2. rebase the remaining commits on top of that.
>>>>      1. I'm a bit worried about this rebase. There are merge commits,
>>> etc. in
>>>>         the history. I'm not sure this will just all rebase while
>>> preserving all
>>>>         the commits, but maybe it will "just work"
>>>>  3. create a "patch set" corresponding to the 100 or so commits that
>>> make up the
>>>>     "contribution".
>>>>      1. I don't know if this is even feasible for this many commits.
>>>>  4. create a tar/zip of this aggregate patch set.
>>>>  5. compute an md5 of this patch set.
>>>>
>>>> The patch set tar/zip file and its md5 hash are "the granted software".
>>>>
>>>> The problem with this idea is that there's no obvious way to review a
>>> patch set,
>>>> shy of applying it.
>>>>
>>>> A better way may be to change steps 3 - 5 above to
>>>>
>>>> Plan B:
>>>>
>>>>     3. push the main branch to a new empty git repository
>>>>          The point of this is to remove all historic stuff from the
>>> repository,
>>>>     i.e., have a minimal git repo that contains only the contribution
>>> and the
>>>>     single other commit it must be based on.
>>>>
>>>>     4. create a tarball of this git repository, and md5 hash of it
>>>>
>>>>     5. document that the contribution is from githash X (after the
>>> first commit)
>>>>     to githash Y (the final commit) of this repository
>>>>
>>>>
>>>> This has the advantage that the contribution is a self-contained
>>> review-able thing.
>>>>
>>>> Other ideas are welcome. (Plans C, D, etc) The only requirements I know
>>> of are:
>>>>
>>>>  1. a single file containing the contribution, and its md5 hash
>>>>  2. a sensible way one can review the contents of this contribution file
>>>>  3. preserve history of derivation from the vscode debugger example.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Mike Beckerle | Principal Engineer
>>>>
>>>> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>>>>
>>>> P +1-781-330-0412
>>>>
>>>
>>>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

The requirement, is that the entire dependency tree (transitively) cannot depend on any software that has an Apache-incompatible (aka restrictive) license.

So we need the transitive closure of all dependencies.


________________________________
From: Adam Rosien <ad...@rosien.net>
Sent: Thursday, September 9, 2021 12:44 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: daffodil-vscode - how to package and identify the contribution - some git questions

(I don't understand the requirements of licencing + transitive
dependencies, so I'm giving some surface level license info)

"ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
http://logback.qos.ch/license.html
"com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL 1.0
"co.fs2" %% "fs2-io" % "3.0.4" - MIT
"com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
"org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0

On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:

> I can relay the list of dependencies and their licenses.
>
> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> I personally don't care too much about having the existing git history
>> once its part of ASF, especially if it makes thing any easier (as you
>> mention, squash/rebase can be difficut through merges). So I'd say we
>> just do plan B--create a tarball of the current state (without the git
>> history), and the content of that tarball is what goes through the IP
>> clearance process, and is the content of the inital commit when adding
>> to the apache/daffodil-vscode repo.
>>
>> Note that I think the incubator will still want access to the existing
>> repo so they can view the full git history. Understanding where
>> everything came from and verifying the provenance is important to
>> ensuring we have all the appropriate CLA's. So while the tarball is
>> maybe what is officially voted on, they will want access to the repo.
>>
>> That said, I don't think we are going to get CLA's for any Microsoft
>> contribute code. So either all Microsoft contributed code will need to
>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>> grant something to ASF where the original codebase stays MIT and isn't
>> part of that grant.
>>
>> I think understanding how much code still exists that is Microsoft/MIT
>> is going to be important to getting this through the IP clearance process.
>>
>> So I'm curious how much of that original Microsoft code still exists? I
>> assume since it was just example code it has mostly been replaced? If
>> that's the case, we could potentially say Microsoft has no ownership of
>> this code, and so their CLA and MIT license aren't necessary?
>>
>> We should also have a good understanding of the dependencies. If any of
>> them are not compatible with ALv2, then going through this process isn't
>> even worth it until they are replaced. Do you have a list of the
>> dependencies?
>>
>>
>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>> > So the daffodil-vscode code-base wants to be granted to become part of
>> the
>> > Daffodil project.
>> >
>> > One question arises which is "what is the contribution?" exactly.
>> >
>> > The normal way this is identified is by creating a tarball of the
>> source files
>> > and specifying an sha or md5 hash of that file.
>> >
>> > However, this code base is perhaps different from usual.
>> >
>> > It started by creating a detached fork of the vscode debugger example
>> code base.
>> > This is MIT-Licensed which is a compatible license.
>> >
>> > The files are then edited. There are around 100 commits on top of the
>> base that
>> > came from the vscode debugger repository.
>> >
>> > So the contribution is that set of 100 commits - the
>> patches/change-sets they
>> > represent.
>> >
>> > These commits often edit the original files of the vscode debugger
>> example to
>> > add the daffodil-specific functionality. That is, the contribution
>> material is
>> > in several cases intermingled in the lines of the existing files.
>> That's ok I
>> > think so long as the modified file had MIT license.
>> >
>> > There's some value in preserving the 100 commits by our contributors,
>> not
>> > squashing it down to one commit, though if it's really not sensible to
>> proceed
>> > otherwise, we can choose to squash it down to one commit.
>> >
>> > Furthermore, the vscode debugger example repo itself had many commits
>> in it. The
>> > current daffodil-vscode repo preserves all these commits as well. I
>> don't see
>> > value in preserving these commits, and would rather they were squashed
>> into a
>> > single "starting point" commit, with a dependencies file specifying the
>> githash
>> > where we forked from, just so we can refer back if necessary.
>> >
>> > So as a starting suggestion (subject to discussion of other
>> alternatives) is this:
>> >
>> > Plan A:
>> >
>> >  1. squash all commits up to and including the last Microsoft commit,
>> together
>> >     into one.
>> >  2. rebase the remaining commits on top of that.
>> >      1. I'm a bit worried about this rebase. There are merge commits,
>> etc. in
>> >         the history. I'm not sure this will just all rebase while
>> preserving all
>> >         the commits, but maybe it will "just work"
>> >  3. create a "patch set" corresponding to the 100 or so commits that
>> make up the
>> >     "contribution".
>> >      1. I don't know if this is even feasible for this many commits.
>> >  4. create a tar/zip of this aggregate patch set.
>> >  5. compute an md5 of this patch set.
>> >
>> > The patch set tar/zip file and its md5 hash are "the granted software".
>> >
>> > The problem with this idea is that there's no obvious way to review a
>> patch set,
>> > shy of applying it.
>> >
>> > A better way may be to change steps 3 - 5 above to
>> >
>> > Plan B:
>> >
>> >     3. push the main branch to a new empty git repository
>> >          The point of this is to remove all historic stuff from the
>> repository,
>> >     i.e., have a minimal git repo that contains only the contribution
>> and the
>> >     single other commit it must be based on.
>> >
>> >     4. create a tarball of this git repository, and md5 hash of it
>> >
>> >     5. document that the contribution is from githash X (after the
>> first commit)
>> >     to githash Y (the final commit) of this repository
>> >
>> >
>> > This has the advantage that the contribution is a self-contained
>> review-able thing.
>> >
>> > Other ideas are welcome. (Plans C, D, etc) The only requirements I know
>> of are:
>> >
>> >  1. a single file containing the contribution, and its md5 hash
>> >  2. a sensible way one can review the contents of this contribution file
>> >  3. preserve history of derivation from the vscode debugger example.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Mike Beckerle | Principal Engineer
>> >
>> > mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>> >
>> > P +1-781-330-0412
>> >
>>
>>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Adam Rosien <ad...@rosien.net>.

(I don't understand the requirements of licencing + transitive
dependencies, so I'm giving some surface level license info)

"ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
http://logback.qos.ch/license.html
"com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL 1.0
"co.fs2" %% "fs2-io" % "3.0.4" - MIT
"com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
"org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0

On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien <ad...@rosien.net> wrote:

> I can relay the list of dependencies and their licenses.
>
> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> I personally don't care too much about having the existing git history
>> once its part of ASF, especially if it makes thing any easier (as you
>> mention, squash/rebase can be difficut through merges). So I'd say we
>> just do plan B--create a tarball of the current state (without the git
>> history), and the content of that tarball is what goes through the IP
>> clearance process, and is the content of the inital commit when adding
>> to the apache/daffodil-vscode repo.
>>
>> Note that I think the incubator will still want access to the existing
>> repo so they can view the full git history. Understanding where
>> everything came from and verifying the provenance is important to
>> ensuring we have all the appropriate CLA's. So while the tarball is
>> maybe what is officially voted on, they will want access to the repo.
>>
>> That said, I don't think we are going to get CLA's for any Microsoft
>> contribute code. So either all Microsoft contributed code will need to
>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>> grant something to ASF where the original codebase stays MIT and isn't
>> part of that grant.
>>
>> I think understanding how much code still exists that is Microsoft/MIT
>> is going to be important to getting this through the IP clearance process.
>>
>> So I'm curious how much of that original Microsoft code still exists? I
>> assume since it was just example code it has mostly been replaced? If
>> that's the case, we could potentially say Microsoft has no ownership of
>> this code, and so their CLA and MIT license aren't necessary?
>>
>> We should also have a good understanding of the dependencies. If any of
>> them are not compatible with ALv2, then going through this process isn't
>> even worth it until they are replaced. Do you have a list of the
>> dependencies?
>>
>>
>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>> > So the daffodil-vscode code-base wants to be granted to become part of
>> the
>> > Daffodil project.
>> >
>> > One question arises which is "what is the contribution?" exactly.
>> >
>> > The normal way this is identified is by creating a tarball of the
>> source files
>> > and specifying an sha or md5 hash of that file.
>> >
>> > However, this code base is perhaps different from usual.
>> >
>> > It started by creating a detached fork of the vscode debugger example
>> code base.
>> > This is MIT-Licensed which is a compatible license.
>> >
>> > The files are then edited. There are around 100 commits on top of the
>> base that
>> > came from the vscode debugger repository.
>> >
>> > So the contribution is that set of 100 commits - the
>> patches/change-sets they
>> > represent.
>> >
>> > These commits often edit the original files of the vscode debugger
>> example to
>> > add the daffodil-specific functionality. That is, the contribution
>> material is
>> > in several cases intermingled in the lines of the existing files.
>> That's ok I
>> > think so long as the modified file had MIT license.
>> >
>> > There's some value in preserving the 100 commits by our contributors,
>> not
>> > squashing it down to one commit, though if it's really not sensible to
>> proceed
>> > otherwise, we can choose to squash it down to one commit.
>> >
>> > Furthermore, the vscode debugger example repo itself had many commits
>> in it. The
>> > current daffodil-vscode repo preserves all these commits as well. I
>> don't see
>> > value in preserving these commits, and would rather they were squashed
>> into a
>> > single "starting point" commit, with a dependencies file specifying the
>> githash
>> > where we forked from, just so we can refer back if necessary.
>> >
>> > So as a starting suggestion (subject to discussion of other
>> alternatives) is this:
>> >
>> > Plan A:
>> >
>> >  1. squash all commits up to and including the last Microsoft commit,
>> together
>> >     into one.
>> >  2. rebase the remaining commits on top of that.
>> >      1. I'm a bit worried about this rebase. There are merge commits,
>> etc. in
>> >         the history. I'm not sure this will just all rebase while
>> preserving all
>> >         the commits, but maybe it will "just work"
>> >  3. create a "patch set" corresponding to the 100 or so commits that
>> make up the
>> >     "contribution".
>> >      1. I don't know if this is even feasible for this many commits.
>> >  4. create a tar/zip of this aggregate patch set.
>> >  5. compute an md5 of this patch set.
>> >
>> > The patch set tar/zip file and its md5 hash are "the granted software".
>> >
>> > The problem with this idea is that there's no obvious way to review a
>> patch set,
>> > shy of applying it.
>> >
>> > A better way may be to change steps 3 - 5 above to
>> >
>> > Plan B:
>> >
>> >     3. push the main branch to a new empty git repository
>> >          The point of this is to remove all historic stuff from the
>> repository,
>> >     i.e., have a minimal git repo that contains only the contribution
>> and the
>> >     single other commit it must be based on.
>> >
>> >     4. create a tarball of this git repository, and md5 hash of it
>> >
>> >     5. document that the contribution is from githash X (after the
>> first commit)
>> >     to githash Y (the final commit) of this repository
>> >
>> >
>> > This has the advantage that the contribution is a self-contained
>> review-able thing.
>> >
>> > Other ideas are welcome. (Plans C, D, etc) The only requirements I know
>> of are:
>> >
>> >  1. a single file containing the contribution, and its md5 hash
>> >  2. a sensible way one can review the contents of this contribution file
>> >  3. preserve history of derivation from the vscode debugger example.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Mike Beckerle | Principal Engineer
>> >
>> > mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
>> >
>> > P +1-781-330-0412
>> >
>>
>>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Adam Rosien <ad...@rosien.net>.

I can relay the list of dependencies and their licenses.

On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence <sl...@apache.org> wrote:

> I personally don't care too much about having the existing git history
> once its part of ASF, especially if it makes thing any easier (as you
> mention, squash/rebase can be difficut through merges). So I'd say we
> just do plan B--create a tarball of the current state (without the git
> history), and the content of that tarball is what goes through the IP
> clearance process, and is the content of the inital commit when adding
> to the apache/daffodil-vscode repo.
>
> Note that I think the incubator will still want access to the existing
> repo so they can view the full git history. Understanding where
> everything came from and verifying the provenance is important to
> ensuring we have all the appropriate CLA's. So while the tarball is
> maybe what is officially voted on, they will want access to the repo.
>
> That said, I don't think we are going to get CLA's for any Microsoft
> contribute code. So either all Microsoft contributed code will need to
> be kept MIT, or removed from the codebase. And if feels a bit odd to
> grant something to ASF where the original codebase stays MIT and isn't
> part of that grant.
>
> I think understanding how much code still exists that is Microsoft/MIT
> is going to be important to getting this through the IP clearance process.
>
> So I'm curious how much of that original Microsoft code still exists? I
> assume since it was just example code it has mostly been replaced? If
> that's the case, we could potentially say Microsoft has no ownership of
> this code, and so their CLA and MIT license aren't necessary?
>
> We should also have a good understanding of the dependencies. If any of
> them are not compatible with ALv2, then going through this process isn't
> even worth it until they are replaced. Do you have a list of the
> dependencies?
>
>
> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> > So the daffodil-vscode code-base wants to be granted to become part of
> the
> > Daffodil project.
> >
> > One question arises which is "what is the contribution?" exactly.
> >
> > The normal way this is identified is by creating a tarball of the source
> files
> > and specifying an sha or md5 hash of that file.
> >
> > However, this code base is perhaps different from usual.
> >
> > It started by creating a detached fork of the vscode debugger example
> code base.
> > This is MIT-Licensed which is a compatible license.
> >
> > The files are then edited. There are around 100 commits on top of the
> base that
> > came from the vscode debugger repository.
> >
> > So the contribution is that set of 100 commits - the patches/change-sets
> they
> > represent.
> >
> > These commits often edit the original files of the vscode debugger
> example to
> > add the daffodil-specific functionality. That is, the contribution
> material is
> > in several cases intermingled in the lines of the existing files.
> That's ok I
> > think so long as the modified file had MIT license.
> >
> > There's some value in preserving the 100 commits by our contributors,
> not
> > squashing it down to one commit, though if it's really not sensible to
> proceed
> > otherwise, we can choose to squash it down to one commit.
> >
> > Furthermore, the vscode debugger example repo itself had many commits in
> it. The
> > current daffodil-vscode repo preserves all these commits as well. I
> don't see
> > value in preserving these commits, and would rather they were squashed
> into a
> > single "starting point" commit, with a dependencies file specifying the
> githash
> > where we forked from, just so we can refer back if necessary.
> >
> > So as a starting suggestion (subject to discussion of other
> alternatives) is this:
> >
> > Plan A:
> >
> >  1. squash all commits up to and including the last Microsoft commit,
> together
> >     into one.
> >  2. rebase the remaining commits on top of that.
> >      1. I'm a bit worried about this rebase. There are merge commits,
> etc. in
> >         the history. I'm not sure this will just all rebase while
> preserving all
> >         the commits, but maybe it will "just work"
> >  3. create a "patch set" corresponding to the 100 or so commits that
> make up the
> >     "contribution".
> >      1. I don't know if this is even feasible for this many commits.
> >  4. create a tar/zip of this aggregate patch set.
> >  5. compute an md5 of this patch set.
> >
> > The patch set tar/zip file and its md5 hash are "the granted software".
> >
> > The problem with this idea is that there's no obvious way to review a
> patch set,
> > shy of applying it.
> >
> > A better way may be to change steps 3 - 5 above to
> >
> > Plan B:
> >
> >     3. push the main branch to a new empty git repository
> >          The point of this is to remove all historic stuff from the
> repository,
> >     i.e., have a minimal git repo that contains only the contribution
> and the
> >     single other commit it must be based on.
> >
> >     4. create a tarball of this git repository, and md5 hash of it
> >
> >     5. document that the contribution is from githash X (after the first
> commit)
> >     to githash Y (the final commit) of this repository
> >
> >
> > This has the advantage that the contribution is a self-contained
> review-able thing.
> >
> > Other ideas are welcome. (Plans C, D, etc) The only requirements I know
> of are:
> >
> >  1. a single file containing the contribution, and its md5 hash
> >  2. a sensible way one can review the contents of this contribution file
> >  3. preserve history of derivation from the vscode debugger example.
> >
> >
> >
> >
> >
> >
> >
> >
> > Mike Beckerle | Principal Engineer
> >
> > mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> >
> > P +1-781-330-0412
> >
>
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

Posted by Steve Lawrence <sl...@apache.org>.

I personally don't care too much about having the existing git history
once its part of ASF, especially if it makes thing any easier (as you
mention, squash/rebase can be difficut through merges). So I'd say we
just do plan B--create a tarball of the current state (without the git
history), and the content of that tarball is what goes through the IP
clearance process, and is the content of the inital commit when adding
to the apache/daffodil-vscode repo.

Note that I think the incubator will still want access to the existing
repo so they can view the full git history. Understanding where
everything came from and verifying the provenance is important to
ensuring we have all the appropriate CLA's. So while the tarball is
maybe what is officially voted on, they will want access to the repo.

That said, I don't think we are going to get CLA's for any Microsoft
contribute code. So either all Microsoft contributed code will need to
be kept MIT, or removed from the codebase. And if feels a bit odd to
grant something to ASF where the original codebase stays MIT and isn't
part of that grant.

I think understanding how much code still exists that is Microsoft/MIT
is going to be important to getting this through the IP clearance process.

So I'm curious how much of that original Microsoft code still exists? I
assume since it was just example code it has mostly been replaced? If
that's the case, we could potentially say Microsoft has no ownership of
this code, and so their CLA and MIT license aren't necessary?

We should also have a good understanding of the dependencies. If any of
them are not compatible with ALv2, then going through this process isn't
even worth it until they are replaced. Do you have a list of the
dependencies?

On 9/9/21 11:16 AM, Beckerle, Mike wrote:
> So the daffodil-vscode code-base wants to be granted to become part of the 
> Daffodil project.
> 
> One question arises which is "what is the contribution?" exactly.
> 
> The normal way this is identified is by creating a tarball of the source files 
> and specifying an sha or md5 hash of that file.
> 
> However, this code base is perhaps different from usual.
> 
> It started by creating a detached fork of the vscode debugger example code base. 
> This is MIT-Licensed which is a compatible license.
> 
> The files are then edited. There are around 100 commits on top of the base that 
> came from the vscode debugger repository.
> 
> So the contribution is that set of 100 commits - the patches/change-sets they 
> represent.
> 
> These commits often edit the original files of the vscode debugger example to 
> add the daffodil-specific functionality. That is, the contribution material is 
> in several cases intermingled in the lines of the existing files.  That's ok I 
> think so long as the modified file had MIT license.
> 
> There's some value in preserving the 100 commits by our contributors, not 
> squashing it down to one commit, though if it's really not sensible to proceed 
> otherwise, we can choose to squash it down to one commit.
> 
> Furthermore, the vscode debugger example repo itself had many commits in it. The 
> current daffodil-vscode repo preserves all these commits as well. I don't see 
> value in preserving these commits, and would rather they were squashed into a 
> single "starting point" commit, with a dependencies file specifying the githash 
> where we forked from, just so we can refer back if necessary.
> 
> So as a starting suggestion (subject to discussion of other alternatives) is this:
> 
> Plan A:
> 
>  1. squash all commits up to and including the last Microsoft commit, together
>     into one.
>  2. rebase the remaining commits on top of that.
>      1. I'm a bit worried about this rebase. There are merge commits, etc. in
>         the history. I'm not sure this will just all rebase while preserving all
>         the commits, but maybe it will "just work"
>  3. create a "patch set" corresponding to the 100 or so commits that make up the
>     "contribution".
>      1. I don't know if this is even feasible for this many commits. 
>  4. create a tar/zip of this aggregate patch set.
>  5. compute an md5 of this patch set.
> 
> The patch set tar/zip file and its md5 hash are "the granted software".
> 
> The problem with this idea is that there's no obvious way to review a patch set, 
> shy of applying it.
> 
> A better way may be to change steps 3 - 5 above to
> 
> Plan B:
> 
>     3. push the main branch to a new empty git repository
>          The point of this is to remove all historic stuff from the repository,
>     i.e., have a minimal git repo that contains only the contribution and the
>     single other commit it must be based on.
> 
>     4. create a tarball of this git repository, and md5 hash of it
> 
>     5. document that the contribution is from githash X (after the first commit)
>     to githash Y (the final commit) of this repository
> 
> 
> This has the advantage that the contribution is a self-contained review-able thing.
> 
> Other ideas are welcome. (Plans C, D, etc) The only requirements I know of are:
> 
>  1. a single file containing the contribution, and its md5 hash
>  2. a sensible way one can review the contents of this contribution file
>  3. preserve history of derivation from the vscode debugger example.
> 
> 
> 
> 
> 
> 
> 
> 
> Mike Beckerle | Principal Engineer
> 
> mbeckerle@owlcyberdefense.com <ma...@owlcyberdefense.com>
> 
> P +1-781-330-0412
>