You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "GroovyDan (via GitHub)" <gi...@apache.org> on 2023/02/21 14:44:30 UTC

[GitHub] [hudi] GroovyDan opened a new issue, #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

GroovyDan opened a new issue, #8007:
URL: https://github.com/apache/hudi/issues/8007

   **Describe the problem you faced**
   
   We recently started receiving the following error when trying to load data into our Data Lake via AWS Glue and Apache Hudi: 
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o915.pyWriteDynamicFrame.
   : java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$
   ```
   
   We have not changed any code on our side and the process has been running without issue for months.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Use AWS Glue Connector and try to Upsert an Existing Table.
   
   **Expected behavior**
   
   I would expect the data to get upserted into the table.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1.1
   
   * Hive version : N/A
   
   * Hadoop version : N/A
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   * Glue Version: 3.0
   * Connector: https://709825985650.dkr.ecr.us-east-1.amazonaws.com/amazon-web-services/glue/hudi:0.10.1-glue3.0
   
   * Example Hudi Configurations:
   ```
   {
       "className": "org.apache.hudi",
       "hoodie.datasource.hive_sync.use_jdbc": "false",
       "hoodie.datasource.write.precombine.field": "_sdc_sequence",
       "hoodie.datasource.write.recordkey.field": "appsflyer_id,event_name,event_time",
       "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
       "hoodie.datasource.write.hive_style_partitioning": "true",
       "hoodie.datasource.write.row.writer.enable": "true",
       "hoodie.parquet.compression.codec": "snappy",
       "hoodie.table.name": "in_app_events",
       "hoodie.datasource.hive_sync.database": "everydollar_android_appsflyer_stitch_parquet",
       "hoodie.datasource.hive_sync.table": "in_app_events",
       "hoodie.datasource.hive_sync.enable": "true",
       "path": "s3://rs-prod-stitch-everydollar-android-appsflyer/data/reporting/in_app_events",
       "hoodie.index.type": "BLOOM",
       "hoodie.bloom.index.update.partition.path": "true",
       "hoodie.parquet.small.file.limit": "104857600",
       "hoodie.upsert.shuffle.parallelism": 20,
       "hoodie.datasource.write.operation": "upsert",
       "hoodie.cleaner.policy": "KEEP_LATEST_COMMITS",
       "hoodie.cleaner.commits.retained": 10,
       "hoodie.datasource.write.partitionpath.field": "partn_date",
       "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
       "hoodie.datasource.hive_sync.partition_fields": "partn_date"
   }
   ```
   
   **Stacktrace**
   
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling o915.pyWriteDynamicFrame.
   : java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters$
   	at org.apache.hudi.AvroConversionUtils$.convertStructTypeToAvroSchema(AvroConversionUtils.scala:63)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:226)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   	at com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45)
   	at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:71)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.avro.SchemaConverters$
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   	... 43 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] GroovyDan commented on issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

Posted by "GroovyDan (via GitHub)" <gi...@apache.org>.

GroovyDan commented on issue #8007:
URL: https://github.com/apache/hudi/issues/8007#issuecomment-1438610047

   I plan on opening a case with AWS about this issue, but wanted to post it here in case there is a quick fix or others are running into a similar problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] GroovyDan commented on issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

Posted by "GroovyDan (via GitHub)" <gi...@apache.org>.

GroovyDan commented on issue #8007:
URL: https://github.com/apache/hudi/issues/8007#issuecomment-1438868177

   I switched to Glue Version 4.0 and am no longer getting this error. I am leaving the case with AWS open and will be awaiting their response to see what the issue was with Glue Version 3.0. Closing this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] GroovyDan closed issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters

Posted by "GroovyDan (via GitHub)" <gi...@apache.org>.

GroovyDan closed issue #8007: [SUPPORT] java.lang.NoClassDefFoundError: org/apache/spark/sql/avro/SchemaConverters
URL: https://github.com/apache/hudi/issues/8007


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org