You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Steve Loughran <st...@hortonworks.com> on 2016/11/07 18:14:18 UTC

Json.org licensing, amazon-AWS and Jackson versions

https://issues.apache.org/jira/browse/HADOOP-13794: JSON.org<http://JSON.org> license is now forbidden by the ASF From distribution.


Which means we can't make any Hadoop releases with the AWS SDK JARs < =1.11.0 in them, meaning https://issues.apache.org/jira/browse/HADOOP-13050 has moved up from a minor issue to a blocker, and are going to have to worry about the older branches.

1. The latest amazon-AWS SDKs absolutely do not work with shipping jackson version: it even references artifacts that don't appear until  Jackson 2.3.3; and needs to on a later version than that to actually work.
2. AWS SDK updates have generally needed code changes (example: HADOOP-12269)

For 2.8.x we can increment the AWS SDK, and take this as a time to increment jackson, which an XEE vulnerability was hinting at anwyay ( https://issues.apache.org/jira/browse/HADOOP-12705) . I know this has a risk of problems, but Sean Mackrory has done the due diliegence to show that Jackson 2.7.8 doesn't break existing API use in Hadoop; after that jackson goes incompatible (again).


For Branch 2.6.x we may just want to take the easy way out, and not bundle the (very dated) AWS JAR; just strip it out of the final set of artifacts to include in the project dist, and tell people that if they want to use s3a in 2.6.x (which I think people should really avoid, given it to too 2.7.1 to stabilize), then they need to manually install it.


Which leaves Hadoop 2.7.x, doesn't it? What to do? People are using s3a, it's working well, and putting the AWS JARs are going to cause problems. But pushing up a Jackson update in a 2.7.x update is going to be traumatic.

-Steve

Re: Json.org licensing, amazon-AWS and Jackson versions

Posted by Steve Loughran <st...@hortonworks.com>.
in this case the json.org<http://json.org> classes are lurking inside the AMS JARs, so it's not swappable, and it's not immediately obvious there's a problem. You need something to scan all the JARs for forbidden .class files.

Oddly enough, Java ships with a tool to scan all the JARs for specific .class files, we call it "the classloader". It would be possible for someone to write some parameter driven test suite which attempted a loadResource() of  the forbidden classes, failing a test if one was there. Subclass this suite into the various separate modules at the end of the DAG (hadoop-aws, hadoop-azure), and we can use JUnit to implement the work



On 7 Nov 2016, at 19:14, Andrew Wang <an...@cloudera.com>> wrote:

Have we looked into swapping in the Android cleanroom implementation of json.org<http://json.org/>? The issue with Jackson bumps is always the classpath clashes with downstream projects.

https://wiki.debian.org/qa.debian.org/jsonevil
https://android.googlesource.com/platform/libcore/+/master/json/

Maybe we need to build it ourselves, but it's still better than bumping the Jackson version.


I'm wondering if we can't just produce our own shaded derivative of the AWS jar: merge in the AWS artifacts unshaded, shade in its jackson dependency. This would let us use it in 2.7+ without worrying about jackson versions.

I'd still avoid it for 2.6.x, because I doubt new versions will be compatible with Java 6; it's not worth worrying about.

I think I might give a lighting talk at Apachecon Big Data next week, "just because it''s a project right to use an incompatible version of jackson, doesn't mean it's a duty".  I can reminisce fondly about the Elder Days when Xerces didn't come with the JVM; every project bundled Xerces and Xalan on the CP —but at least they were single JAR releases with stable APIs.


On Mon, Nov 7, 2016 at 10:14 AM, Steve Loughran <st...@hortonworks.com>> wrote:

https://issues.apache.org/jira/browse/HADOOP-13794: JSON.org<http://JSON.org><http://JSON.org<http://json.org/>> license is now forbidden by the ASF From distribution.


Which means we can't make any Hadoop releases with the AWS SDK JARs < =1.11.0 in them, meaning https://issues.apache.org/jira/browse/HADOOP-13050 has moved up from a minor issue to a blocker, and are going to have to worry about the older branches.

1. The latest amazon-AWS SDKs absolutely do not work with shipping jackson version: it even references artifacts that don't appear until  Jackson 2.3.3; and needs to on a later version than that to actually work.
2. AWS SDK updates have generally needed code changes (example: HADOOP-12269)

For 2.8.x we can increment the AWS SDK, and take this as a time to increment jackson, which an XEE vulnerability was hinting at anwyay ( https://issues.apache.org/jira/browse/HADOOP-12705) . I know this has a risk of problems, but Sean Mackrory has done the due diliegence to show that Jackson 2.7.8 doesn't break existing API use in Hadoop; after that jackson goes incompatible (again).


For Branch 2.6.x we may just want to take the easy way out, and not bundle the (very dated) AWS JAR; just strip it out of the final set of artifacts to include in the project dist, and tell people that if they want to use s3a in 2.6.x (which I think people should really avoid, given it to too 2.7.1 to stabilize), then they need to manually install it.


Which leaves Hadoop 2.7.x, doesn't it? What to do? People are using s3a, it's working well, and putting the AWS JARs are going to cause problems. But pushing up a Jackson update in a 2.7.x update is going to be traumatic.

-Steve



Re: Json.org licensing, amazon-AWS and Jackson versions

Posted by Andrew Wang <an...@cloudera.com>.
To answer my own question, I looked at the Android implementation, and it's
not provided as a standalone JAR. robolectric has a single jar of all the
android classes:

https://mvnrepository.com/artifact/org.robolectric/android-all

Apache Commons has a recent thread to the board about providing
self-contained badly-licensed deps like this and findbugs-annotations as a
service to other projects. No reply yet.

https://lists.apache.org/thread.html/1685c0e985df284d2b60e1291a4eefa05b8243f5ee8665bc3f546cfc@%3Clegal-discuss.apache.org%3E

I added my +1 to this thread and am watching it. Hopefully the Commons team
can get this turned around quickly.

On Mon, Nov 7, 2016 at 11:14 AM, Andrew Wang <an...@cloudera.com>
wrote:

> Have we looked into swapping in the Android cleanroom implementation of
> json.org? The issue with Jackson bumps is always the classpath clashes
> with downstream projects.
>
> https://wiki.debian.org/qa.debian.org/jsonevil
> https://android.googlesource.com/platform/libcore/+/master/json/
>
> Maybe we need to build it ourselves, but it's still better than bumping
> the Jackson version.
>
> On Mon, Nov 7, 2016 at 10:14 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> https://issues.apache.org/jira/browse/HADOOP-13794: JSON.org<
>> http://JSON.org> license is now forbidden by the ASF From distribution.
>>
>>
>> Which means we can't make any Hadoop releases with the AWS SDK JARs <
>> =1.11.0 in them, meaning https://issues.apache.org/jira
>> /browse/HADOOP-13050 has moved up from a minor issue to a blocker, and
>> are going to have to worry about the older branches.
>>
>> 1. The latest amazon-AWS SDKs absolutely do not work with shipping
>> jackson version: it even references artifacts that don't appear until
>> Jackson 2.3.3; and needs to on a later version than that to actually work.
>> 2. AWS SDK updates have generally needed code changes (example:
>> HADOOP-12269)
>>
>> For 2.8.x we can increment the AWS SDK, and take this as a time to
>> increment jackson, which an XEE vulnerability was hinting at anwyay (
>> https://issues.apache.org/jira/browse/HADOOP-12705) . I know this has a
>> risk of problems, but Sean Mackrory has done the due diliegence to show
>> that Jackson 2.7.8 doesn't break existing API use in Hadoop; after that
>> jackson goes incompatible (again).
>>
>>
>> For Branch 2.6.x we may just want to take the easy way out, and not
>> bundle the (very dated) AWS JAR; just strip it out of the final set of
>> artifacts to include in the project dist, and tell people that if they want
>> to use s3a in 2.6.x (which I think people should really avoid, given it to
>> too 2.7.1 to stabilize), then they need to manually install it.
>>
>>
>> Which leaves Hadoop 2.7.x, doesn't it? What to do? People are using s3a,
>> it's working well, and putting the AWS JARs are going to cause problems.
>> But pushing up a Jackson update in a 2.7.x update is going to be traumatic.
>>
>> -Steve
>>
>
>

Re: Json.org licensing, amazon-AWS and Jackson versions

Posted by Andrew Wang <an...@cloudera.com>.
Have we looked into swapping in the Android cleanroom implementation of
json.org? The issue with Jackson bumps is always the classpath clashes with
downstream projects.

https://wiki.debian.org/qa.debian.org/jsonevil
https://android.googlesource.com/platform/libcore/+/master/json/

Maybe we need to build it ourselves, but it's still better than bumping the
Jackson version.

On Mon, Nov 7, 2016 at 10:14 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> https://issues.apache.org/jira/browse/HADOOP-13794: JSON.org<
> http://JSON.org> license is now forbidden by the ASF From distribution.
>
>
> Which means we can't make any Hadoop releases with the AWS SDK JARs <
> =1.11.0 in them, meaning https://issues.apache.org/
> jira/browse/HADOOP-13050 has moved up from a minor issue to a blocker,
> and are going to have to worry about the older branches.
>
> 1. The latest amazon-AWS SDKs absolutely do not work with shipping jackson
> version: it even references artifacts that don't appear until  Jackson
> 2.3.3; and needs to on a later version than that to actually work.
> 2. AWS SDK updates have generally needed code changes (example:
> HADOOP-12269)
>
> For 2.8.x we can increment the AWS SDK, and take this as a time to
> increment jackson, which an XEE vulnerability was hinting at anwyay (
> https://issues.apache.org/jira/browse/HADOOP-12705) . I know this has a
> risk of problems, but Sean Mackrory has done the due diliegence to show
> that Jackson 2.7.8 doesn't break existing API use in Hadoop; after that
> jackson goes incompatible (again).
>
>
> For Branch 2.6.x we may just want to take the easy way out, and not bundle
> the (very dated) AWS JAR; just strip it out of the final set of artifacts
> to include in the project dist, and tell people that if they want to use
> s3a in 2.6.x (which I think people should really avoid, given it to too
> 2.7.1 to stabilize), then they need to manually install it.
>
>
> Which leaves Hadoop 2.7.x, doesn't it? What to do? People are using s3a,
> it's working well, and putting the AWS JARs are going to cause problems.
> But pushing up a Jackson update in a 2.7.x update is going to be traumatic.
>
> -Steve
>