You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/13 11:00:37 UTC

[GitHub] [hudi] pan3793 commented on issue #4793: [SUPPORT] Integration test broken after upgrading from 0.10.0 to 0.10.1

pan3793 commented on issue #4793:
URL: https://github.com/apache/hudi/issues/4793#issuecomment-1038016578


   @nsivabalan thanks for your reply.
   
   > may I know which hudi bundle or artifact you are using?
   
   We use the vanilla jars instead of the bundle jar because of
   
   - Hudi bundle jar name contains the exactly Spark patched version, e.g. `hudi-spark3.1.2-bundle*`, if we choose it, what if we want to upgrade Spark version to 3.1.3(voting phase), do we need to wait/ask Hudi community to publish the `hudi-spark3.1.3-bundle*` jar?
   
   - Hudi bundle jar contains lots of classes from transitive dependencies **WITHOUT** relocation, which makes a high risk of class conflict if the user also provides the original jars, e.g. `kotlin`, `curator`.
    
   I think Hudi has room to improve the bundle jar to reduce dependency maintenance effort for users/downstream projects. Compared to other data lake formats, delta restricts to involve dependencies other than spark, the [delta-core](https://mvnrepository.com/artifact/io.delta/delta-core_2.12/1.1.0) has only one transitive dependency `jackson-core-asl` which is not included in spark runtime jars. Iceberg provides `runtime` jar which is something like Hudi bundle jars but has such differences:
   1. The iceberg runtime jar does not contain classes that already exist in spark runtime libraries, e.g. `curator`
   2. The iceberg runtime jar relocates nearly every class other than `org.apache.iceberg` package to avoid potential class conflict with user classes.
   3. The iceberg provides runtime jars for each supported spark minor version, e.g. [`iceberg-spark-runtime-0.13.0.jar`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime/0.13.0/iceberg-spark-runtime-0.13.0.jar) for spark 2.4.x, [`iceberg-spark3-runtime-0.13.0.jar`](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark3-runtime/0.13.0/iceberg-spark3-runtime-0.13.0.jar) from spark 3.0.x, [iceberg-spark-runtime-3.1_2.12-0.13.0.jar](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.1_2.12/0.13.0/iceberg-spark-runtime-3.1_2.12-0.13.0.jar) for spark 3.1.x, [iceberg-spark-runtime-3.2_2.12-0.13.0.jar](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/0.13.0/iceberg-spark-runtime-3.2_2.12-0.13.0.jar) for spark 3.2.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org