You are viewing a plain text version of this content. The canonical link for it is here.
Posted to legal-discuss@apache.org by Sean Owen <sr...@apache.org> on 2018/06/25 14:34:10 UTC

Re: LICENSE and NOTICE file content

@legal-discuss, brief recap:

In Spark's test source code and release, there are some JAR files which
exist to test handling of JAR files. Example: TestSerDe.jar in
https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files


Justin raises the legitimate question: these don't belong in a source
release, do they?

My operating theory had been that they are more like binary blobs w.r.t.
Spark, like a test JPEG or data file, and are not the compiled version of
any test code in Spark. They need to exist in order to run the tests from a
source release. So it's not quite a case of shipping compiled Spark code in
a source release.

I can imagine three opinions:

1) It's OK.
2) It's OK, but you need to include the source code to even those test JAR
files somewhere
3) It's not fine, and the toolchain has to separately build these from
source first automatically

I found https://markmail.org/thread/nf3lsdy5m3c3ovbr on legal-discuss
previously, which seems to incline towards 2.

I'm also inclined towards 2, as 3 is probably relatively tricky in practice
even though that's a nice-to-have.

I'd welcome opinions on this one.

Sean


On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean <ju...@classsoftware.com>
wrote:

> > It's not test code; test code would indeed have to be distributed as
> source as well. They are binary blobs, if you like, needed by test code,
> that happen to be JARs here and not JPEGs or .docx files or something.
> These help test handling of JAR files.
>
> Which IMO is still not allowed in a source release, but as I said it would
> be best for you to check on legal discuss.
>
>

Re: LICENSE and NOTICE file content

Posted by Rob Vesse <rv...@dotnetrdf.org>.
 

From: Jan Lahoda <la...@gmail.com>
Reply-To: <le...@apache.org>
Date: Tuesday, 26 June 2018 at 20:11
To: <le...@apache.org>
Subject: Re: LICENSE and NOTICE file content

 

Spending days on administrativia to publish a test data package for a bugfix feels to be a little bit on the heavy side.

 

You seem to have latched on to the extreme interpretation of the suggestion. Yes you would need to periodically publish the test data package as an official artefact, most likely alongside your official releases. Therefore you could have a single vote, with the test data source being one of the items voted upon.

 

However, you would not need to publish it as an official release for every single change during the course of development. There is absolutely no reason that the test data package depended on can’t be a snapshot in development the same as the rest of your internal project dependencies are. Tests and test data will inevitably change during the development and really only need to be static at the time of a release.

 

Rob


Re: LICENSE and NOTICE file content

Posted by Alex Harui <ah...@adobe.com.INVALID>.

From: Jan Lahoda <la...@gmail.com>
Reply-To: "legal-discuss@apache.org" <le...@apache.org>
Date: Tuesday, June 26, 2018 at 12:11 PM
To: "legal-discuss@apache.org" <le...@apache.org>
Subject: Re: LICENSE and NOTICE file content

On Tue, Jun 26, 2018 at 7:28 PM, Alex Harui <ah...@adobe.com.invalid>> wrote:
What doesn’t seem right to me is that you can ship a binary without any way to recreate that binary from sources in a way sufficient enough to make it useful to others.  That doesn’t seem “open” or “source” to me.

I guess I struggle with what it means to "recreate test data". As an example, in NetBeans there's a test that the classfile reading library does not crash when it reads a (broken) classfile produced by a specific (fairly old) version of JDK. If one had the source code, then it would be possible to compile the source code, but unless one has the given JDK, that's not recreating the test data. Current JDKs won't (AFAIK) produce the problematic classfile, and then the test proves nothing.

The projects I work on have their own compiler.  In order to debug the binary output, we have a “dump” utility that converts the byte code into human-readable form.  On my to-do list is a tool that converts the human readable form back into binary form.    If you have such tools then the source package would ship the human-readable form and the build script would convert back to .class files and run the usual tests.  IMO, this would satisfy the objectives.  The human-readable form could contain comments that describe the exact pattern that is being tested.  The files would be text and thus less likely to be an attack surface.

You could probably also just Base64Encode the .class files as well, but having a human-readable, annotated text ‘source’ for the binary seems like it could be useful.

My 2 cents,
-Alex


My 2 cents,
-Alex


Re: LICENSE and NOTICE file content

Posted by Jan Lahoda <la...@gmail.com>.
On Tue, Jun 26, 2018 at 7:28 PM, Alex Harui <ah...@adobe.com.invalid>
wrote:

>
>
>
>
> *From: *Jan Lahoda <la...@gmail.com>
> *Reply-To: *"legal-discuss@apache.org" <le...@apache.org>
> *Date: *Tuesday, June 26, 2018 at 1:44 AM
> *To: *"legal-discuss@apache.org" <le...@apache.org>
> *Subject: *Re: LICENSE and NOTICE file content
>
>
>
> (As the NetBeans has (among others) a library for reading classfiles, I
> guess this discussion also relates to it, and I'd like to share some of my
> thoughts.)
>
>
>
> On Tue, Jun 26, 2018 at 7:15 AM, Alex Harui <ah...@adobe.com.invalid>
> wrote:
>
> code unless there is some way to solve the “security/safety” goal.  Maybe
> it is good enough to give the file a different suffix so it appears as a
> non-executable file.  But I would probably just have the tool’s source
> package build script download the convenience binary of the upstream test
> source package.
>
>
>
> That is extra overhead for sure, but I don’t think that is ‘impractical’.
> And I still wouldn’t hold up any release for this kind of issue.
> Incrementally make improvements in subsequent releases.  Create a test-data
> source release.  Then adjust the main source package to download the test
> jar.
>
>
>
> I may be too pessimistic, but in my experience when creating a test is
> more complicated, the probability of having a test decreases. And not
> having a test feels like a sub-optimal software engineering practice.
>
> FTR, I think there are multiple variants to avoid having classfiles in the
> repository, like maybe using jcod (not sure if that's OK or not); at the
> same time, I think having an approach that does not discourage proper
> engineering practices has benefits.
>
>
>
> Lots of things in ASF open source are “more complicated”.  As you noted,
> it takes at least 3 days to make a release.  You can’t just turn to a
> colleague, make a major decision and implement it.   Do these things
>

I'd like to point out I was not talking about major decisions. I was
talking about a (simple or not) bugfix. And for that bugfix, when one wants
to create a test case (or a set of test cases; which by themselves are
still in the source code), there may be a need to have data over which the
test runs. This test data may resemble some kind of source code, or
classfile, or something else; but it is really a data processed by the
test. Spending days on administrativia to publish a test data package for a
bugfix feels to be a little bit on the heavy side.


> discourage proper engineering practices?  Maybe, but they exist to help
> ensure the “sharing” and “safety”, and as you also mention, there are
> probably variants, or creative ways to not sacrifice integrity of your
> software.
>

I can imagine quite a few possibilities to avoid having test data classfile
in the repository, but, frankly, none one them feels to me like an obvious
right solution, which would fulfill well all the requirements. (Placing
binary classfiles is not such an obvious right solution either, of course,
but is simpler than the others I can imagine.)


> Hopefully you have a team of folks reviewing commits that would catch that
> a test is missing or use a tool to check that sufficient tests exist.  If a
> bug were found in the test source package, you could put both packages up
> for vote at the same time.
>
>
>
> What doesn’t seem right to me is that you can ship a binary without any
> way to recreate that binary from sources in a way sufficient enough to make
> it useful to others.  That doesn’t seem “open” or “source” to me.
>

I guess I struggle with what it means to "recreate test data". As an
example, in NetBeans there's a test that the classfile reading library does
not crash when it reads a (broken) classfile produced by a specific (fairly
old) version of JDK. If one had the source code, then it would be possible
to compile the source code, but unless one has the given JDK, that's not
recreating the test data. Current JDKs won't (AFAIK) produce the
problematic classfile, and then the test proves nothing.

Jan



>
> My 2 cents,
>
> -Alex
>

Re: LICENSE and NOTICE file content

Posted by Alex Harui <ah...@adobe.com.INVALID>.

From: Jan Lahoda <la...@gmail.com>
Reply-To: "legal-discuss@apache.org" <le...@apache.org>
Date: Tuesday, June 26, 2018 at 1:44 AM
To: "legal-discuss@apache.org" <le...@apache.org>
Subject: Re: LICENSE and NOTICE file content

(As the NetBeans has (among others) a library for reading classfiles, I guess this discussion also relates to it, and I'd like to share some of my thoughts.)

On Tue, Jun 26, 2018 at 7:15 AM, Alex Harui <ah...@adobe.com.invalid>> wrote:
code unless there is some way to solve the “security/safety” goal.  Maybe it is good enough to give the file a different suffix so it appears as a non-executable file.  But I would probably just have the tool’s source package build script download the convenience binary of the upstream test source package.

That is extra overhead for sure, but I don’t think that is ‘impractical’.  And I still wouldn’t hold up any release for this kind of issue.  Incrementally make improvements in subsequent releases.  Create a test-data source release.  Then adjust the main source package to download the test jar.

I may be too pessimistic, but in my experience when creating a test is more complicated, the probability of having a test decreases. And not having a test feels like a sub-optimal software engineering practice.
FTR, I think there are multiple variants to avoid having classfiles in the repository, like maybe using jcod (not sure if that's OK or not); at the same time, I think having an approach that does not discourage proper engineering practices has benefits.

Lots of things in ASF open source are “more complicated”.  As you noted, it takes at least 3 days to make a release.  You can’t just turn to a colleague, make a major decision and implement it.   Do these things discourage proper engineering practices?  Maybe, but they exist to help ensure the “sharing” and “safety”, and as you also mention, there are probably variants, or creative ways to not sacrifice integrity of your software.  Hopefully you have a team of folks reviewing commits that would catch that a test is missing or use a tool to check that sufficient tests exist.  If a bug were found in the test source package, you could put both packages up for vote at the same time.

What doesn’t seem right to me is that you can ship a binary without any way to recreate that binary from sources in a way sufficient enough to make it useful to others.  That doesn’t seem “open” or “source” to me.

My 2 cents,
-Alex

Re: LICENSE and NOTICE file content

Posted by Jan Lahoda <la...@gmail.com>.
(As the NetBeans has (among others) a library for reading classfiles, I
guess this discussion also relates to it, and I'd like to share some of my
thoughts.)

On Tue, Jun 26, 2018 at 7:15 AM, Alex Harui <ah...@adobe.com.invalid>
wrote:

> AIUI, our primary objectives for open source are about “sharing” (and
> open-ness in general and “security/safety”.  So yeah, there is some
> overhead to being an open source project.  We want source packages that
> folks can use in other ways, and that folks can use without fear of getting
> infected by a virus.  We generally recommend that folks use our code by
> building from sources.  If the source package contains executable code,
> there is always a chance that some evil person will find a way to exploit
> that.
>
>
>
> So, IMO, even a byte-code manipulation tool that has test data, probably
> had that test data compiled from some source.  We should make that source
> available so that folks can try different variations, or even fix a bug
> that’s “been there forever that nobody found until just now”.
>

I think one should be very (very, very) careful when modifying test data.
One needs to be absolutely sure the test still tests what it was testing
before, otherwise "fixing" a "bug" in the test data may actually make the
test useless. "Negative" tests (tests that verify that something (usually a
crash/exception) does not happen) are particularly prone to such an
accidental invalidation.


>
>
> However, I don’t know of any Apache policy or convention that dictates
> that the source for the test byte code must be provided in the same package
> as the byte-code manipulation tool.  You could create a separate release of
> the source for the test byte code and never release it again if it never
> changes.  Then that would be an upstream dependency for the byte-code
> manipulation tool source package.  But if it were up to me, the tool’s
> source package still would not contain the byte
>

I assume when a bug is fixed, a new test would (ideally) be written, which
means new set of test data, which means new release of the test data,
right? So a bug cannot absolutely be fixed (with a test) quicker than in 3
days (3+3 days for podlings)? (And one needs to be careful to not change
the existing test data in convenience binaries, just add the new one, of
course.)


> code unless there is some way to solve the “security/safety” goal.  Maybe
> it is good enough to give the file a different suffix so it appears as a
> non-executable file.  But I would probably just have the tool’s source
> package build script download the convenience binary of the upstream test
> source package.
>
>
>
> That is extra overhead for sure, but I don’t think that is ‘impractical’.
> And I still wouldn’t hold up any release for this kind of issue.
> Incrementally make improvements in subsequent releases.  Create a test-data
> source release.  Then adjust the main source package to download the test
> jar.
>

I may be too pessimistic, but in my experience when creating a test is more
complicated, the probability of having a test decreases. And not having a
test feels like a sub-optimal software engineering practice.

FTR, I think there are multiple variants to avoid having classfiles in the
repository, like maybe using jcod (not sure if that's OK or not); at the
same time, I think having an approach that does not discourage proper
engineering practices has benefits.

Jan


>
>
> My 2 cents,
>
> -Alex
>
>
>
> *From: *David Jencks <da...@gmail.com>
> *Reply-To: *"legal-discuss@apache.org" <le...@apache.org>
> *Date: *Monday, June 25, 2018 at 1:31 PM
> *To: *"legal-discuss@apache.org Discuss" <le...@apache.org>
> *Subject: *Re: LICENSE and NOTICE file content
>
>
>
> I don’t know what function these files serve here, but IMO a blanket
> condemnation of precompiled classes for test data in apache source releases
> make certain kinds of projects impractical to develop at apache.  If I was
> developing a byte code manipulation tool, I would want as test data a wide
> variety of unchanging byte code samples.  For instance, one category might
> result from a particular AL2  java file compiled with every possible
> compiler I could find, possibly also modified by every other byte code
> manipulation tool I could find.  I’d expect that I’d also want saved
> “output” byte code to check that the output doesn’t change.  Building such
> binary artifacts as part of the build completely eliminates their
> usefulness as test data. Of course how the byte code was constructed needs
> to be carefully documented.
>
>
>
> David Jencks
>
>
>
> On Jun 25, 2018, at 12:44 PM, Sean Owen <sr...@apache.org> wrote:
>
>
>
> Yes the code in there is ALv2 licensed; appears to be either created for
> Spark or copied from Hive. Yes, irrespective of the policy issue, it's
> important to be able to recreate these JARs somehow, and I don't think we
> have the source in the repo for all of them (at least, the ones that
> originate from Spark). That much seems like a must-do.
>
>
>
> After that, seems worth figuring out just how hard it is to build these
> artifacts from source. If it's easy, great. If not, either the test can be
> removed or we figure out just how hard a requirement this is.
>
> On Mon, Jun 25, 2018 at 11:34 AM Alex Harui <ah...@adobe.com.invalid>
> wrote:
>
> I am not an official answer person, but IMO, the first question is:  “Is
> the source for TestSerDe.jar ‘open source’ under an ALv2-compatible
> license?”.
>
>
>
> If “yes”, then supply the source in the source release and not the JAR.
> One of the reasons for “no compiled code in a source release” is that it is
> very difficult to verify that compiled code is “correct” and not corrupted,
> infected with a virus, etc.
>
>
>
> If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t
> use it or need to treat it as optional, or a runtime dependency.
>
>
>
> The related question is:  How do folks modify this JAR?  If it was a JPEG,
> there are plenty of JPEG modification tools.  There really aren’t JAR
> modification tools that modify JARs internal .class files, you really
> should use the source files.  I am still surprised/puzzled by the answer in
> the thread you linked to.  It still seems in both cases that a “binary” is
> being supplied for “convenience”.  IMO, there should be very few, if any,
> things in an Apache source repo that are “unmodifiable”.
>
>
>
> The “workaround” of renaming the .jar or .class files to something else so
> it isn’t seen as executable code seems like it still doesn’t fully meet the
> spirit of an open source release, either, but better than shipping
> executable code in a source package.
>
>
>
> On the other hand, I would not hold up a release for an issue like this.
> Fix it in some future release.
>
>
>
> My 2 cents,
>
> -Alex
>
>
>
>
>

Re: LICENSE and NOTICE file content

Posted by Alex Harui <ah...@adobe.com.INVALID>.
AIUI, our primary objectives for open source are about “sharing” (and open-ness in general and “security/safety”.  So yeah, there is some overhead to being an open source project.  We want source packages that folks can use in other ways, and that folks can use without fear of getting infected by a virus.  We generally recommend that folks use our code by building from sources.  If the source package contains executable code, there is always a chance that some evil person will find a way to exploit that.

So, IMO, even a byte-code manipulation tool that has test data, probably had that test data compiled from some source.  We should make that source available so that folks can try different variations, or even fix a bug that’s “been there forever that nobody found until just now”.

However, I don’t know of any Apache policy or convention that dictates that the source for the test byte code must be provided in the same package as the byte-code manipulation tool.  You could create a separate release of the source for the test byte code and never release it again if it never changes.  Then that would be an upstream dependency for the byte-code manipulation tool source package.  But if it were up to me, the tool’s source package still would not contain the byte code unless there is some way to solve the “security/safety” goal.  Maybe it is good enough to give the file a different suffix so it appears as a non-executable file.  But I would probably just have the tool’s source package build script download the convenience binary of the upstream test source package.

That is extra overhead for sure, but I don’t think that is ‘impractical’.  And I still wouldn’t hold up any release for this kind of issue.  Incrementally make improvements in subsequent releases.  Create a test-data source release.  Then adjust the main source package to download the test jar.

My 2 cents,
-Alex

From: David Jencks <da...@gmail.com>
Reply-To: "legal-discuss@apache.org" <le...@apache.org>
Date: Monday, June 25, 2018 at 1:31 PM
To: "legal-discuss@apache.org Discuss" <le...@apache.org>
Subject: Re: LICENSE and NOTICE file content

I don’t know what function these files serve here, but IMO a blanket condemnation of precompiled classes for test data in apache source releases make certain kinds of projects impractical to develop at apache.  If I was developing a byte code manipulation tool, I would want as test data a wide variety of unchanging byte code samples.  For instance, one category might result from a particular AL2  java file compiled with every possible compiler I could find, possibly also modified by every other byte code manipulation tool I could find.  I’d expect that I’d also want saved “output” byte code to check that the output doesn’t change.  Building such binary artifacts as part of the build completely eliminates their usefulness as test data. Of course how the byte code was constructed needs to be carefully documented.

David Jencks


On Jun 25, 2018, at 12:44 PM, Sean Owen <sr...@apache.org>> wrote:

Yes the code in there is ALv2 licensed; appears to be either created for Spark or copied from Hive. Yes, irrespective of the policy issue, it's important to be able to recreate these JARs somehow, and I don't think we have the source in the repo for all of them (at least, the ones that originate from Spark). That much seems like a must-do.

After that, seems worth figuring out just how hard it is to build these artifacts from source. If it's easy, great. If not, either the test can be removed or we figure out just how hard a requirement this is.
On Mon, Jun 25, 2018 at 11:34 AM Alex Harui <ah...@adobe.com.invalid>> wrote:
I am not an official answer person, but IMO, the first question is:  “Is the source for TestSerDe.jar ‘open source’ under an ALv2-compatible license?”.

If “yes”, then supply the source in the source release and not the JAR.  One of the reasons for “no compiled code in a source release” is that it is very difficult to verify that compiled code is “correct” and not corrupted, infected with a virus, etc.

If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t use it or need to treat it as optional, or a runtime dependency.

The related question is:  How do folks modify this JAR?  If it was a JPEG, there are plenty of JPEG modification tools.  There really aren’t JAR modification tools that modify JARs internal .class files, you really should use the source files.  I am still surprised/puzzled by the answer in the thread you linked to.  It still seems in both cases that a “binary” is being supplied for “convenience”.  IMO, there should be very few, if any, things in an Apache source repo that are “unmodifiable”.

The “workaround” of renaming the .jar or .class files to something else so it isn’t seen as executable code seems like it still doesn’t fully meet the spirit of an open source release, either, but better than shipping executable code in a source package.

On the other hand, I would not hold up a release for an issue like this.  Fix it in some future release.

My 2 cents,
-Alex



Re: LICENSE and NOTICE file content

Posted by David Jencks <da...@gmail.com>.
I don’t know what function these files serve here, but IMO a blanket condemnation of precompiled classes for test data in apache source releases make certain kinds of projects impractical to develop at apache.  If I was developing a byte code manipulation tool, I would want as test data a wide variety of unchanging byte code samples.  For instance, one category might result from a particular AL2  java file compiled with every possible compiler I could find, possibly also modified by every other byte code manipulation tool I could find.  I’d expect that I’d also want saved “output” byte code to check that the output doesn’t change.  Building such binary artifacts as part of the build completely eliminates their usefulness as test data. Of course how the byte code was constructed needs to be carefully documented.

David Jencks

> On Jun 25, 2018, at 12:44 PM, Sean Owen <sr...@apache.org> wrote:
> 
> Yes the code in there is ALv2 licensed; appears to be either created for Spark or copied from Hive. Yes, irrespective of the policy issue, it's important to be able to recreate these JARs somehow, and I don't think we have the source in the repo for all of them (at least, the ones that originate from Spark). That much seems like a must-do.
> 
> After that, seems worth figuring out just how hard it is to build these artifacts from source. If it's easy, great. If not, either the test can be removed or we figure out just how hard a requirement this is.
> 
> On Mon, Jun 25, 2018 at 11:34 AM Alex Harui <ah...@adobe.com.invalid> wrote:
> I am not an official answer person, but IMO, the first question is:  “Is the source for TestSerDe.jar ‘open source’ under an ALv2-compatible license?”.
> 
>  
> 
> If “yes”, then supply the source in the source release and not the JAR.  One of the reasons for “no compiled code in a source release” is that it is very difficult to verify that compiled code is “correct” and not corrupted, infected with a virus, etc.
> 
>  
> 
> If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t use it or need to treat it as optional, or a runtime dependency.
> 
>  
> 
> The related question is:  How do folks modify this JAR?  If it was a JPEG, there are plenty of JPEG modification tools.  There really aren’t JAR modification tools that modify JARs internal .class files, you really should use the source files.  I am still surprised/puzzled by the answer in the thread you linked to.  It still seems in both cases that a “binary” is being supplied for “convenience”.  IMO, there should be very few, if any, things in an Apache source repo that are “unmodifiable”.
> 
>  
> 
> The “workaround” of renaming the .jar or .class files to something else so it isn’t seen as executable code seems like it still doesn’t fully meet the spirit of an open source release, either, but better than shipping executable code in a source package.
> 
>  
> 
> On the other hand, I would not hold up a release for an issue like this.  Fix it in some future release.
> 
>  
> 
> My 2 cents,
> 
> -Alex
> 
>  
> 


Re: LICENSE and NOTICE file content

Posted by Sean Owen <sr...@apache.org>.
Yes the code in there is ALv2 licensed; appears to be either created for
Spark or copied from Hive. Yes, irrespective of the policy issue, it's
important to be able to recreate these JARs somehow, and I don't think we
have the source in the repo for all of them (at least, the ones that
originate from Spark). That much seems like a must-do.

After that, seems worth figuring out just how hard it is to build these
artifacts from source. If it's easy, great. If not, either the test can be
removed or we figure out just how hard a requirement this is.

On Mon, Jun 25, 2018 at 11:34 AM Alex Harui <ah...@adobe.com.invalid>
wrote:

> I am not an official answer person, but IMO, the first question is:  “Is
> the source for TestSerDe.jar ‘open source’ under an ALv2-compatible
> license?”.
>
>
>
> If “yes”, then supply the source in the source release and not the JAR.
> One of the reasons for “no compiled code in a source release” is that it is
> very difficult to verify that compiled code is “correct” and not corrupted,
> infected with a virus, etc.
>
>
>
> If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t
> use it or need to treat it as optional, or a runtime dependency.
>
>
>
> The related question is:  How do folks modify this JAR?  If it was a JPEG,
> there are plenty of JPEG modification tools.  There really aren’t JAR
> modification tools that modify JARs internal .class files, you really
> should use the source files.  I am still surprised/puzzled by the answer in
> the thread you linked to.  It still seems in both cases that a “binary” is
> being supplied for “convenience”.  IMO, there should be very few, if any,
> things in an Apache source repo that are “unmodifiable”.
>
>
>
> The “workaround” of renaming the .jar or .class files to something else so
> it isn’t seen as executable code seems like it still doesn’t fully meet the
> spirit of an open source release, either, but better than shipping
> executable code in a source package.
>
>
>
> On the other hand, I would not hold up a release for an issue like this.
> Fix it in some future release.
>
>
>
> My 2 cents,
>
> -Alex
>
>
>
>

Re: LICENSE and NOTICE file content

Posted by Sean Owen <sr...@apache.org>.
Yes the code in there is ALv2 licensed; appears to be either created for
Spark or copied from Hive. Yes, irrespective of the policy issue, it's
important to be able to recreate these JARs somehow, and I don't think we
have the source in the repo for all of them (at least, the ones that
originate from Spark). That much seems like a must-do.

After that, seems worth figuring out just how hard it is to build these
artifacts from source. If it's easy, great. If not, either the test can be
removed or we figure out just how hard a requirement this is.

On Mon, Jun 25, 2018 at 11:34 AM Alex Harui <ah...@adobe.com.invalid>
wrote:

> I am not an official answer person, but IMO, the first question is:  “Is
> the source for TestSerDe.jar ‘open source’ under an ALv2-compatible
> license?”.
>
>
>
> If “yes”, then supply the source in the source release and not the JAR.
> One of the reasons for “no compiled code in a source release” is that it is
> very difficult to verify that compiled code is “correct” and not corrupted,
> infected with a virus, etc.
>
>
>
> If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t
> use it or need to treat it as optional, or a runtime dependency.
>
>
>
> The related question is:  How do folks modify this JAR?  If it was a JPEG,
> there are plenty of JPEG modification tools.  There really aren’t JAR
> modification tools that modify JARs internal .class files, you really
> should use the source files.  I am still surprised/puzzled by the answer in
> the thread you linked to.  It still seems in both cases that a “binary” is
> being supplied for “convenience”.  IMO, there should be very few, if any,
> things in an Apache source repo that are “unmodifiable”.
>
>
>
> The “workaround” of renaming the .jar or .class files to something else so
> it isn’t seen as executable code seems like it still doesn’t fully meet the
> spirit of an open source release, either, but better than shipping
> executable code in a source package.
>
>
>
> On the other hand, I would not hold up a release for an issue like this.
> Fix it in some future release.
>
>
>
> My 2 cents,
>
> -Alex
>
>
>
>

Re: LICENSE and NOTICE file content

Posted by Alex Harui <ah...@adobe.com.INVALID>.
I am not an official answer person, but IMO, the first question is:  “Is the source for TestSerDe.jar ‘open source’ under an ALv2-compatible license?”.

If “yes”, then supply the source in the source release and not the JAR.  One of the reasons for “no compiled code in a source release” is that it is very difficult to verify that compiled code is “correct” and not corrupted, infected with a virus, etc.

If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t use it or need to treat it as optional, or a runtime dependency.

The related question is:  How do folks modify this JAR?  If it was a JPEG, there are plenty of JPEG modification tools.  There really aren’t JAR modification tools that modify JARs internal .class files, you really should use the source files.  I am still surprised/puzzled by the answer in the thread you linked to.  It still seems in both cases that a “binary” is being supplied for “convenience”.  IMO, there should be very few, if any, things in an Apache source repo that are “unmodifiable”.

The “workaround” of renaming the .jar or .class files to something else so it isn’t seen as executable code seems like it still doesn’t fully meet the spirit of an open source release, either, but better than shipping executable code in a source package.

On the other hand, I would not hold up a release for an issue like this.  Fix it in some future release.

My 2 cents,
-Alex

From: Sean Owen <sr...@apache.org>
Reply-To: "legal-discuss@apache.org" <le...@apache.org>
Date: Monday, June 25, 2018 at 7:34 AM
To: "legal-discuss@apache.org" <le...@apache.org>
Cc: "justin@classsoftware.com" <ju...@classsoftware.com>, "dev@spark.apache.org" <de...@spark.apache.org>
Subject: Re: LICENSE and NOTICE file content

@legal-discuss, brief recap:

In Spark's test source code and release, there are some JAR files which exist to test handling of JAR files. Example: TestSerDe.jar in https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fmaster%2Fsql%2Fhive%2Fsrc%2Ftest%2Fresources%2Fdata%2Ffiles&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=ISOCfVRQzS1AtA6gqmZIJ8fVf3UFFL3ZAQSiYM%2FfXi4%3D&reserved=0>

Justin raises the legitimate question: these don't belong in a source release, do they?

My operating theory had been that they are more like binary blobs w.r.t. Spark, like a test JPEG or data file, and are not the compiled version of any test code in Spark. They need to exist in order to run the tests from a source release. So it's not quite a case of shipping compiled Spark code in a source release.

I can imagine three opinions:

1) It's OK.
2) It's OK, but you need to include the source code to even those test JAR files somewhere
3) It's not fine, and the toolchain has to separately build these from source first automatically

I found https://markmail.org/thread/nf3lsdy5m3c3ovbr<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fthread%2Fnf3lsdy5m3c3ovbr&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=GpOWWI6hVBHT%2FBOetkruO7ZH18%2FPdNpLOGX8spaKnX8%3D&reserved=0> on legal-discuss previously, which seems to incline towards 2.

I'm also inclined towards 2, as 3 is probably relatively tricky in practice even though that's a nice-to-have.

I'd welcome opinions on this one.

Sean


On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean <ju...@classsoftware.com>> wrote:
> It's not test code; test code would indeed have to be distributed as source as well. They are binary blobs, if you like, needed by test code, that happen to be JARs here and not JPEGs or .docx files or something. These help test handling of JAR files.

Which IMO is still not allowed in a source release, but as I said it would be best for you to check on legal discuss.

Re: LICENSE and NOTICE file content

Posted by Alex Harui <ah...@adobe.com.INVALID>.
I am not an official answer person, but IMO, the first question is:  “Is the source for TestSerDe.jar ‘open source’ under an ALv2-compatible license?”.

If “yes”, then supply the source in the source release and not the JAR.  One of the reasons for “no compiled code in a source release” is that it is very difficult to verify that compiled code is “correct” and not corrupted, infected with a virus, etc.

If “no”, then treat as a 3rd-party dependency.  Which may mean you can’t use it or need to treat it as optional, or a runtime dependency.

The related question is:  How do folks modify this JAR?  If it was a JPEG, there are plenty of JPEG modification tools.  There really aren’t JAR modification tools that modify JARs internal .class files, you really should use the source files.  I am still surprised/puzzled by the answer in the thread you linked to.  It still seems in both cases that a “binary” is being supplied for “convenience”.  IMO, there should be very few, if any, things in an Apache source repo that are “unmodifiable”.

The “workaround” of renaming the .jar or .class files to something else so it isn’t seen as executable code seems like it still doesn’t fully meet the spirit of an open source release, either, but better than shipping executable code in a source package.

On the other hand, I would not hold up a release for an issue like this.  Fix it in some future release.

My 2 cents,
-Alex

From: Sean Owen <sr...@apache.org>
Reply-To: "legal-discuss@apache.org" <le...@apache.org>
Date: Monday, June 25, 2018 at 7:34 AM
To: "legal-discuss@apache.org" <le...@apache.org>
Cc: "justin@classsoftware.com" <ju...@classsoftware.com>, "dev@spark.apache.org" <de...@spark.apache.org>
Subject: Re: LICENSE and NOTICE file content

@legal-discuss, brief recap:

In Spark's test source code and release, there are some JAR files which exist to test handling of JAR files. Example: TestSerDe.jar in https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/data/files<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fmaster%2Fsql%2Fhive%2Fsrc%2Ftest%2Fresources%2Fdata%2Ffiles&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=ISOCfVRQzS1AtA6gqmZIJ8fVf3UFFL3ZAQSiYM%2FfXi4%3D&reserved=0>

Justin raises the legitimate question: these don't belong in a source release, do they?

My operating theory had been that they are more like binary blobs w.r.t. Spark, like a test JPEG or data file, and are not the compiled version of any test code in Spark. They need to exist in order to run the tests from a source release. So it's not quite a case of shipping compiled Spark code in a source release.

I can imagine three opinions:

1) It's OK.
2) It's OK, but you need to include the source code to even those test JAR files somewhere
3) It's not fine, and the toolchain has to separately build these from source first automatically

I found https://markmail.org/thread/nf3lsdy5m3c3ovbr<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarkmail.org%2Fthread%2Fnf3lsdy5m3c3ovbr&data=02%7C01%7Caharui%40adobe.com%7Cebfec420df224fbdd1e908d5daa8c109%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636655340682402475&sdata=GpOWWI6hVBHT%2FBOetkruO7ZH18%2FPdNpLOGX8spaKnX8%3D&reserved=0> on legal-discuss previously, which seems to incline towards 2.

I'm also inclined towards 2, as 3 is probably relatively tricky in practice even though that's a nice-to-have.

I'd welcome opinions on this one.

Sean


On Sat, Jun 23, 2018 at 7:34 PM Justin Mclean <ju...@classsoftware.com>> wrote:
> It's not test code; test code would indeed have to be distributed as source as well. They are binary blobs, if you like, needed by test code, that happen to be JARs here and not JPEGs or .docx files or something. These help test handling of JAR files.

Which IMO is still not allowed in a source release, but as I said it would be best for you to check on legal discuss.