You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/10 05:14:01 UTC

[GitHub] [hudi] deshpandeanoop opened a new issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

deshpandeanoop opened a new issue #3635:
URL: https://github.com/apache/hudi/issues/3635


   Hi Team,
   
   I'm getting `java.lang.NoSuchMethodError` exception when I try launching the Spark application in stand alone mode.
   **Exception Trace:**
   ```
   Exception in thread "main" java.lang.NoSuchMethodError: org.apache.avro.Schema.createUnion([Lorg/apache/avro/Schema;)Lorg/apache/avro/Schema;
   	at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:185)
   	at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:176)
   	at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$$anonfun$5.apply(SchemaConverters.scala:174)
   	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
   	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
   	at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
   	at org.apache.hudi.spark.org.apache.spark.sql.avro.SchemaConverters$.toAvroType(SchemaConverters.scala:174)
   	at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:52)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:139)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:134)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
   ```
   
   Below is my `build.sbt` file:
   ```
    scalaVersion := "2.11.8"
   
        libraryDependencies += ("org.apache.spark" % "spark-core_2.11" % "2.3.1"% "provided")
         .exclude("org.apache.avro", "avro")
         .exclude("org.apache.avro", "avro-ipc")
         .exclude("org.apache.avro", "avro-mapred")
   
        libraryDependencies += ("org.apache.spark" % "spark-sql_2.11" % "2.3.1" % "provided")
          .exclude("org.apache.avro", "avro")
   
        libraryDependencies += ("org.apache.hudi" % "hudi-spark-bundle_2.11" % "0.7.0")
   
   
        libraryDependencies += "org.apache.spark" %% "spark-avro" % "2.4.4"
   
        libraryDependencies += ("com.typesafe.play" %% "play-json" % "2.4.0-M3")
          .exclude("org.slf4j", "slf4j-api")
          .exclude("org.slf4j", "slf4j-log4j12")
          .exclude("org.slf4j", "jcl-over-slf4j")
          .exclude("io.netty", "netty-all")
   ```
   
   Spark submit command:
   ```
   spark-submit --master local --jars <base-dir>/avro-1.8.2.jar --deploy-mode client \
       --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
        --class "com.explore.hudi.HudiServiceMainJob" ApacheHudiService.jar
   ```
   
   Versions of the softwares installed on my system,
   
   - Scala: `2.11`
   - Spark: `spark-2.3.1-bin-hadoop2.7`
   
   Just to add, I'm trying to read a `csv` extract and create a `hudi` table out of it, below is the my sample code that gets executed upon launching my spark application.
   ```
    sparkSession
         .read
         .csv(inputCsvAbsPath)
         .map(row => LibraryCheckoutInfo(
           bibNumber = row.getString(0),
           itemBarcode = row.getString(1),
           itemType = row.getString(2),
           collection = row.getString(3),
           callNumber = row.getString(4)))
         .write
         .format(AppConstants.SPARK_FORMAT_HUDI)
         .options(QuickstartUtils.getQuickstartWriteConfigs)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "bibNumber")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "itemType")
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "collection")
         .option(HoodieWriteConfig.TABLE_NAME, hudiTableName)
         .mode(SaveMode.Overwrite)
         .save(hudiTableBasePath)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan closed issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Posted by GitBox <gi...@apache.org>.

xushiyan closed issue #3635:
URL: https://github.com/apache/hudi/issues/3635


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Posted by GitBox <gi...@apache.org>.

yanghua commented on issue #3635:
URL: https://github.com/apache/hudi/issues/3635#issuecomment-918728589


   Hi @vinothchandar Can you chime in to answer this question?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] deshpandeanoop commented on issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Posted by GitBox <gi...@apache.org>.

deshpandeanoop commented on issue #3635:
URL: https://github.com/apache/hudi/issues/3635#issuecomment-920664496


   @vinothchandar : A gentle reminder to help us out here :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Posted by GitBox <gi...@apache.org>.

xushiyan commented on issue #3635:
URL: https://github.com/apache/hudi/issues/3635#issuecomment-926332253


   @deshpandeanoop noticed that you used a different version `"spark-avro" % "2.4.4"` rather than your main spark version 2.3.1
   
   A more important matter: it has been discussed long time back where we only support spark 2.4+
   
   https://lists.apache.org/thread.html/r19ec206e33f8b63e95a840bfba519ab89d1b7af790adef0bc369d618%40%3Cdev.hudi.apache.org%3E
   
   Strongly suggest that you make efforts to upgrade Spark in some way. there are important avro updates in newer version. Spark 2.3.1 uses avro 1.7.7 which can be very problematic
   https://github.com/apache/spark/blob/30aaa5a3a1076ca52439a905274b1fcf498bc562/pom.xml#L142


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yanghua commented on issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Posted by GitBox <gi...@apache.org>.

yanghua commented on issue #3635:
URL: https://github.com/apache/hudi/issues/3635#issuecomment-917891687


   Could you please choose a Spark version greater than 2.4 or 2.4+?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] deshpandeanoop commented on issue #3635: [SUPPORT] java.lang.NoSuchMethodError while launching Spark 2.3.1 in stand alone cluster mode.

Posted by GitBox <gi...@apache.org>.

deshpandeanoop commented on issue #3635:
URL: https://github.com/apache/hudi/issues/3635#issuecomment-918219776


   @yanghua : Is there anyway to have this working with spark version `2.3.1`? Currently I'm trying to launch on my local machine for testing purposes. Once I'm done with my development, I will be launching it on our on-prem cluster(shared by multiple teams) which is having spark `2.3.1` installed and we don't have any control to upgrade it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org