You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/16 07:00:43 UTC

[GitHub] [hudi] eshu opened a new issue, #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

eshu opened a new issue, #6692:
URL: https://github.com/apache/hudi/issues/6692

   The class case exception was thrown when writing data
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   * Spark version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   **Additional context**
   
   AWS Glue 3 environment
   
   **Stacktrace**
   
   ```
   java.lang.ClassCastException: org.apache.hudi.org.apache.avro.Schema$RecordSchema cannot be cast to org.apache.avro.Schema
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:246)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:183)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1273877741

   @xushiyan 
   We create our own fat jar that including Hudi Spark 3.1 bundle, hudi-aws-bundle and some other libraries, such as connectors to Kafka, BigQuery, ES. Also there is a set of libraries that provided by Glue environment.
   
   But let me remind the situation again.
   
   1. The error message is `org.apache.hudi.org.apache.avro.Schema$RecordSchema cannot be cast to org.apache.avro.Schema` - this is the answer what library is shading Avro. Do you see the prefix `org.apache.hudi` in the shaded class name?
   2. The library works with Hudi 0.11.1. Unit tests and the staging and production environment are working without such issues.
   3. Unit tests shows that Hudi 0.12.0 works on freshly created data.
   4. The error happens in case when I try to use it on staging datasets that created with Hudi 0.11.1.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1272232164

   @xushiyan I use the fat jar, but I do not know what is added to the classpath by AWS in Glue 3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1271018152

   @xushiyan But I see spark-avro as a transitive dependency for hudi-aws-bundle. And hudi-aws-bundle has it as a direct and transitive dependency too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1271738970

   i think the issue originated from hadoop-mr-bundle shaded `org.apache.avro.*` package. @eshu can you list out all the jars you add to the classpath? specifically the hudi jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1258476253

   @eshu To help reproduce the problem, could you also provide the schema used for the write?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1277589735

   > We create our own fat jar that including Hudi Spark 3.1 bundle, hudi-aws-bundle 
   
   @eshu i see the problem. with 0.12.0 hudi-aws-bundle shades avro which conflicts with spark bundle. this is fixed in 0.12.1 (releasing soon)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1250010727

   @xushiyan It worked on 0.11.1. When I reverted version to 0.11.1, it started to work again.
   
   Sorry, I do not have a simple code snippet for this case. Settings are like
   ```
   DataSourceWriteOptions.ASYNC_COMPACT_ENABLE -> false,
   DataSourceWriteOptions.HIVE_STYLE_PARTITIONING -> true,
   DataSourceWriteOptions.PAYLOAD_CLASS_NAME -> classOf[<class inherited from org.apache.hudi.common.model.EventTimeAvroPayload>].getName,
   DataSourceWriteOptions.PRECOMBINE_FIELD -> DaaS.Columns.InternalTimestamp,
   DataSourceWriteOptions.TABLE_TYPE -> DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL,
   FileSystemViewStorageConfig.INCREMENTAL_TIMELINE_SYNC_ENABLE -> false,
   HoodieCleanConfig.CLEANER_FILE_VERSIONS_RETAINED -> 3,
   HoodieCleanConfig.CLEANER_INCREMENTAL_MODE_ENABLE -> true,
   HoodieCleanConfig.CLEANER_POLICY -> HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS,
   HoodieCompactionConfig.INLINE_COMPACT -> true,
   HoodiePayloadConfig.EVENT_TIME_FIELD -> Columns.InternalTimestamp,
   HoodiePayloadConfig.ORDERING_FIELD -> Columns.InternalTimestamp,
   // https://github.com/apache/hudi/issues/4230
   HoodieWriteConfig.MARKERS_TYPE -> MarkerType.DIRECT,
   // TODO Remove it when https://github.com/apache/hudi/issues/6283 is fixed:
   HoodieWriteConfig.ROLLBACK_USING_MARKERS_ENABLE -> false
   
   HoodieSyncConfig.META_SYNC_DATABASE_NAME -> parameters(HiveDatabaseParameter),
   HoodieSyncConfig.META_SYNC_TABLE_NAME -> parameters(HiveTableParameter).toLowerCase,
   HoodieSyncConfig.META_SYNC_ENABLED -> "true",
   HiveSyncConfig.HIVE_SYNC_MODE -> HiveSyncMode.GLUE.name,
   DataSourceWriteOptions.META_SYNC_CLIENT_TOOL_CLASS_NAME -> classOf[AwsGlueCatalogSyncTool].getName
   
   HoodieWriteConfig.UPSERT_PARALLELISM_VALUE -> 200
   DataSourceWriteOptions.OPERATION -> WriteOperationType.UPSERT
   HoodieMetadataConfig.ENABLE -> true
   
   DataSourceWriteOptions.PARTITIONPATH_FIELD -> partition,
   HoodieWriteConfig.TBL_NAME -> table
   DataSourceWriteOptions.TABLE_NAME -> table
   DataSourceWriteOptions.RECORDKEY_FIELD -> recordKey
   ```
   
   It breaks on the piece of code:
   ```
         df.write
           .mode(saveMode) // Append
           .format("hudi")
           .options(...)
           .save(path)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1287945367

   feel free to re-open new one if you need further assistance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1284836860

   @xushiyan Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1271026057

   hadoop-common have a dependency on avro, hbase-hadoop-compat, hadoop-mapreduce-client-core.
   
   I do not have dependencies on spark-avro, but probably Glue 3 contains it. And it is out of my control.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1249135728

   @eshu can you share more details? like code snippet you used and the configs for this job. thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1258476717

   CC @umehrot2 @rahil-c 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] eshu commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
eshu commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1258814398

   @yihua There is nothing unusual in schema, just string, numeric and timestamp fields. This issue appears on all tables that I tried.
   But in unit tests, when I create and fill datasets from 0.12.0 initially, it does not happen. So I think this is a migration issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6692:
URL: https://github.com/apache/hudi/issues/6692#issuecomment-1272232597

   > @xushiyan I use the fat jar, but I do not know what is added to the classpath by AWS in Glue 3.
   
   @eshu by fat jar you mean bundle jar? Is it Hudi Spark 3.1 bundle? And it's the only bundle you used? I need to reproduce this by putting the same jar as you did. So pls provide info on what jars you added to your glue job. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0
URL: https://github.com/apache/hudi/issues/6692


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org