You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/09 13:33:42 UTC

[GitHub] [iceberg] rymurr opened a new pull request #1895: Add nessie to spark3/spark2 runtime jar

rymurr opened a new pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895


   Resulting from discussion in #1887 this adds Nessie catalog to the spark runtimes. This adds ~496K to the 18M runtime jar.
   
   Note: the extra dependency `'org.glassfish.jersey.media:jersey-media-json-jackson` is the only Nessie dependency that isn't already included in Spark. The version of this dependency is pinned to the specific version of jersey in Spark 2.4 and 3 to ensure compatibility in the shared jar.
   
   This has been tested externally to ensure the catalog and all its dependencies are loaded correctly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-751895301


   @rymurr or @jacques-n, any update on this? I would like to aim to get it into the next release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-753187437


   Thanks @rymurr! I merged this. Do you also want to create PRs for the Flink and Hive runtime Jars?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#discussion_r542921537



##########
File path: spark3-runtime/LICENSE
##########
@@ -579,3 +579,19 @@ License text:
 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+--------------------------------------------------------------------------------
+
+This binary artifact contains code from Project Nessie.
+
+Copyright: 2020 Dremio Corporation.
+Home page: https://projectnessie.org/
+License: http://www.apache.org/licenses/LICENSE-2.0
+
+--------------------------------------------------------------------------------
+
+This binary artifact contains Eclipse Jersey bindings for jackson.
+
+Copyright: 201,2020 Oracle and/or its affiliates.
+Home page: https://github.com/eclipse-ee4j/jersey
+License: http://www.eclipse.org/legal/epl-2.0

Review comment:
       We need to include a copy of the license text here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-742677126


   Ive added Nessie and Jersey to `NOTICE` and `LICENSE`. Not 100% sure about Jersey as its CDDL licensed in Spark2. @jacques-n and @laurentgo look ok to you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-741968113


   > What about the Flink and Hive runtimes?
   
   I haven't tested Flink or Hive yet so want to do those as separate PRs. Working on Hive today :-)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#discussion_r542921370



##########
File path: spark3-runtime/NOTICE
##########
@@ -495,3 +495,128 @@ file:
 |   * HOMEPAGE:
 |     * http://www.opensource.apple.com/source/configd/configd-453.19/dnsinfo/dnsinfo.h
 
+--------------------------------------------------------------------------------
+
+This binary artifact includes Project Nessie with the following in its NOTICE
+file:
+
+| Dremio
+| Copyright 2015-2017 Dremio Corporation
+|
+| This product includes software developed at
+| The Apache Software Foundation (http://www.apache.org/).

Review comment:
       I don't think that we typically need the copyright part, or acknowledgement that there is ASF software. Since this is boilerplate and the relevant information is in LICENSE, we can probably omit it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jacques-n commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
jacques-n commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-751897739


   We just merged [Nessie#575](https://github.com/projectnessie/nessie/pull/575) which removes the Jaxrs/jersey requirement entirely. We'll push a release shortly that updates the dependency to avoid this extra set of licenses.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on a change in pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rymurr commented on a change in pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#discussion_r543326240



##########
File path: spark3-runtime/LICENSE
##########
@@ -579,3 +579,19 @@ License text:
 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+--------------------------------------------------------------------------------
+
+This binary artifact contains code from Project Nessie.
+
+Copyright: 2020 Dremio Corporation.
+Home page: https://projectnessie.org/
+License: http://www.apache.org/licenses/LICENSE-2.0
+
+--------------------------------------------------------------------------------
+
+This binary artifact contains Eclipse Jersey bindings for jackson.

Review comment:
       FWIW Spark distributes [LICENSE-binary](https://github.com/apache/spark/blob/master/LICENSE-binary) and [NOTICE-binary](https://github.com/apache/spark/blob/master/NOTICE-binary) with a footnote for any non-apache licenses.
   
   Hopefully I can remove jersey entirely and skip EPL completely




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-742175606


   Adding new classes and dependencies into the runtime Jars requires an update to the LICENSE and NOTICE files for those Jars. Could you also update those in this PR, @rymurr?
   
   @jacques-n, can you help with the license update validation?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#discussion_r542924715



##########
File path: spark3-runtime/LICENSE
##########
@@ -579,3 +579,19 @@ License text:
 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+--------------------------------------------------------------------------------
+
+This binary artifact contains code from Project Nessie.
+
+Copyright: 2020 Dremio Corporation.
+Home page: https://projectnessie.org/
+License: http://www.apache.org/licenses/LICENSE-2.0
+
+--------------------------------------------------------------------------------
+
+This binary artifact contains Eclipse Jersey bindings for jackson.

Review comment:
       EPL 2.0 is listed under [Category B, "weak" copyleft licenses](https://www.apache.org/legal/resolved.html). The relevant sections from that page are:
   
   > Software under the following licenses may be included in binary form within an Apache product if the inclusion is appropriately labeled (see above)
   
   > By including only the object/binary form, there is less exposed surface area of the third-party work from which a work might be derived; this addresses the second guiding principle of this policy.
   
   And here's the part about labelling:
   
   > APPROPRIATELY LABELLED CONDITION
   > In all Category B cases our users should not be surprised at their inclusion in our products. By attaching an appropriate and prominent label to the distribution users are less likely to be unaware of restrictions significantly different from those of the Apache License. An appropriate and prominent label is a label the user will read while learning about the distribution - for example in a README, and it should identify the third-party product, its licensing, and a url to the its homepage. Please also ensure to comply with any attribution/notice requirements in the specific license in question.
   
   I think that means we will need a "prominent label" that a user will read while learning about the distribution. I would add one as a new section to the project's README, as well as on the releases page. We should also link to it from anywhere we recommend downloading a runtime Jar on the site. We should also add any notes needed to comply with [EPL's requirements](https://www.eclipse.org/legal/epl-2.0/#requirements).
   
   Is it possible to remove this dependency? That might be easier.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jacques-n commented on a change in pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
jacques-n commented on a change in pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#discussion_r542928816



##########
File path: spark3-runtime/LICENSE
##########
@@ -579,3 +579,19 @@ License text:
 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
+--------------------------------------------------------------------------------
+
+This binary artifact contains code from Project Nessie.
+
+Copyright: 2020 Dremio Corporation.
+Home page: https://projectnessie.org/
+License: http://www.apache.org/licenses/LICENSE-2.0
+
+--------------------------------------------------------------------------------
+
+This binary artifact contains Eclipse Jersey bindings for jackson.

Review comment:
       Somehow I apparently didn't post and lost the comments that I thought I had added to this ticket. @rymurr we should look at how other tools handle this. Jersey is used by most Apache projects and most of it is already packaged by tools like Spark, Hive, etc. The only thing we're actually doing here is adding the extension for jackson encoding/decoding for Jersey. 
   
   @rdblue we're also exploring avoiding jersey entirely as Hive 2.x has a nasty old version of it that is causing us headaches.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-741947293


   What about the Flink and Hive runtimes?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jacques-n commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
jacques-n commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-742176946


   Will do.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-753197126


   Thanks @rdblue!
   
   I am running through some flink Tests and wkll raise the relevant PR soon.
   
   Hive is a bit more work as the discussion around properties is relevant to how we specify custom catalogs. Hope to have something for this soon too.
   
   Happy new year!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on pull request #1895: Add nessie to spark3/spark2 runtime jar

Posted by GitBox <gi...@apache.org>.
rymurr commented on pull request #1895:
URL: https://github.com/apache/iceberg/pull/1895#issuecomment-752929358


   I have updated to remove jax-rs and correct the license and notice. This should be ready to merge once #2016 is merged


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org