You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/02/07 23:45:00 UTC

[jira] [Closed] (HUDI-5689) CDC fails in Deltastreamer

     [ https://issues.apache.org/jira/browse/HUDI-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo closed HUDI-5689.
---------------------------
    Resolution: Fixed

> CDC fails in Deltastreamer
> --------------------------
>
>                 Key: HUDI-5689
>                 URL: https://issues.apache.org/jira/browse/HUDI-5689
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Raymond Xu
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> After enabling CDC, Deltastreamer fails to ingest data (0.13.0-rc1):
> {code:java}
> spark-submit 
> --master yarn 
> --jars /mnt1/hudi-jars/hudi-spark-bundle.jar,/mnt1/hudi-jars/hudi-utilities-slim-bundle.jar 
> --deploy-mode cluster 
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer 
> --conf spark.sql.avro.datetimeRebaseModeInRead=CORRECTED 
> --conf spark.sql.avro.datetimeRebaseModeInWrite=CORRECTED 
> --conf spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED 
> --conf spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED 
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /mnt1/hudi-jars/hudi-utilities-slim-bundle.jar 
> --table-type COPY_ON_WRITE 
> --source-ordering-field ts 
> --source-class org.apache.hudi.utilities.sources.ParquetDFSSource 
> --target-base-path <base_path>
> --target-table emr 
> --payload-class org.apache.hudi.common.model.AWSDmsAvroPayload 
> --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator 
> --hoodie-conf hoodie.datasource.write.recordkey.field=_id 
> --hoodie-conf hoodie.table.cdc.enabled=true 
> --hoodie-conf hoodie.table.cdc.supplemental.logging.mode=cdc_data_before_after 
> --hoodie-conf hoodie.datasource.write.partitionpath.field=partition 
> --hoodie-conf hoodie.deltastreamer.source.dfs.root=<source_path>{code}
> {code:java}
> 23/02/01 22:37:12 ERROR Client: Application diagnostics message: User class threw exception: org.apache.hudi.exception.HoodieException: Commit 20230201223554790 failed and rolled-back !
> 	at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:740)
> 	at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:393)
> 	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:206)
> 	at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
> 	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:204)
> 	at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:742)
> Exception in thread "main" org.apache.spark.SparkException: Application application_1675271857569_0003 finished with failed status
> 	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1354)
> 	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1776)
> 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006)
> 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)