You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/31 00:21:31 UTC

[GitHub] [hudi] eshu opened a new issue, #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

eshu opened a new issue, #5719:
URL: https://github.com/apache/hudi/issues/5719

   I'm trying to build a fat JAR with Hudi bundle and Spark 3.1 (AWS Glue version) support with Scala 2.12
   
   All issues does not exist in Hudi 0.10.1 and earlier versions.
   
   1. Dependencies:
   > [error] Modules were resolved with conflicting cross-version suffixes in ProjectRef(uri("file:/Users/shu/workspace/daas-glue-core/"), "root"):
   > [error]    org.json4s:json4s-ast _2.12, _2.11
   > [error]    org.json4s:json4s-jackson _2.12, _2.11
   > [error]    org.json4s:json4s-core _2.12, _2.11
   > [error]    org.json4s:json4s-scalap _2.12, _2.11
   Why do I have dependency for Scala 2.12 and 2.11?
   Workaround: I added exclusion rule:
   ```
   ("org.apache.hudi" %% "hudi-spark3" % HudiVersion).excludeAll(ExclusionRule("org.json4s", "json4s-jackson_2.11"))
   ```
   There is also dependency to `hudi-spark-common_2.11`, you can check https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark3_2.12/0.11.0
   
   **Why there are dependencies to Scala 2.11?**
   
   2. Multiple sources found for hudi (org.apache.hudi.Spark2DefaultSource, org.apache.hudi.Spark3DefaultSource)
   When trying to write the empty dataset:
   ```
   [info] - should write the empty dataset *** FAILED ***
   [info]   org.apache.spark.sql.AnalysisException: Multiple sources found for hudi (org.apache.hudi.Spark2DefaultSource, org.apache.hudi.Spark3DefaultSource), please specify the fully qualified class name.
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:720)
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:746)
   [info]   at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:993)
   [info]   at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:311)
   [info]   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   ```
   Workaround: I created the package `org.apache.hudi` the class
   ```
   package org.apache.hudi
   
   class Spark2DefaultSource extends DefaultSource  {
     override def shortName(): String = "hudi-spark2"
   }
   ```
   This class overrides Hudi implementation and I can discard it in merge rules.
   
   **Why there are two conflicting definitions?**
   
   3. Spark 3.1 is not supported
   The same test as in the previous example:
   ```
   [info] - should write the empty dataset *** FAILED ***
   [info]   java.lang.ClassNotFoundException: org.apache.spark.sql.adapter.Spark3_1Adapter
   [info]   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
   [info]   at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:102)
   [info]   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   [info]   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   [info]   at org.apache.hudi.SparkAdapterSupport.sparkAdapter(SparkAdapterSupport.scala:37)
   [info]   at org.apache.hudi.SparkAdapterSupport.sparkAdapter$(SparkAdapterSupport.scala:29)
   [info]   at org.apache.hudi.HoodieSparkUtils$.sparkAdapter$lzycompute(HoodieSparkUtils.scala:46)
   [info]   at org.apache.hudi.HoodieSparkUtils$.sparkAdapter(HoodieSparkUtils.scala:46)
   [info]   at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:150)
   [info]   at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:241)
   ```
   
   I did not find a good workaround for this issue. Class Spark3_1Adapter does not exist, I found only Spark3_2Adapter, but there are many references to Spark 3.2 in code.
   
   **Is support of Spark 3.1 dropped?**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #5719:
URL: https://github.com/apache/hudi/issues/5719#issuecomment-1141605686

   @xushiyan If I remove Spark 3 dependency, I have the error
   ```
   [info] - should write the empty dataset *** FAILED ***
   [info]   java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:692)
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:746)
   [info]   at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:993)
   [info]   at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:311)
   [info]   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   [info]   at jp.ne.paypay.daas.data.load.HudiLoad.jp$ne$paypay$daas$data$load$HudiLoad$$$anonfun$apply$4(HudiLoad.scala:106)
   [info]   at jp.ne.paypay.daas.data.load.HudiLoad$$anonfun$apply$6.apply(HudiLoad.scala:91)
   [info]   at jp.ne.paypay.daas.data.load.HudiLoad$$anonfun$apply$6.apply(HudiLoad.scala:91)
   [info]   at jp.ne.paypay.daas.data.load.HudiLoadTest.$anonfun$new$1(HudiLoadTest.scala:27)
   [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   [info]   ...
   [info]   Cause: java.lang.ClassNotFoundException: hudi.DefaultSource
   [info]   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
   [info]   at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:102)
   [info]   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   [info]   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:666)
   [info]   at scala.util.Try$.apply(Try.scala:213)
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:666)
   [info]   at scala.util.Failure.orElse(Try.scala:224)
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:666)
   [info]   at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:746)
   [info]   ...
   ```
   
   Does it mean that Hudi does not support Spark 3 and supports obsolete versions only?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #5719:
URL: https://github.com/apache/hudi/issues/5719#issuecomment-1141598872

   > There is also dependency to hudi-spark-common_2.11, you can check https://mvnrepository.com/artifact/org.apache.hudi/hudi-spark3_2.12/0.11.0
   
   This is module-level jar which is not meant to be imported; we don't support such build process. Only `hudi-*-bundle` jars are meant to be used, which has the right dependencies packaged within it. Please only use bundle jars. If you have to use module level jars as dependencies, then you'll need to build the project yourself with the right maven profile (refer to README.md) and use the locally built jars.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #5719:
URL: https://github.com/apache/hudi/issues/5719#issuecomment-1141612187

   Hudi supports spark 3 please refer to quickstart guide https://hudi.apache.org/docs/quick-start-guide
   And pls use bundle jars as mentioned above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu closed issue #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues

Posted by GitBox <gi...@apache.org>.
eshu closed issue #5719: [SUPPORT] Hudi 0.11.0 for Spark 3.1 build issues
URL: https://github.com/apache/hudi/issues/5719


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org