You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "lenhardtx (via GitHub)" <gi...@apache.org> on 2023/03/12 14:56:14 UTC

[GitHub] [hudi] lenhardtx opened a new issue, #8159: [SUPPORT]

lenhardtx opened a new issue, #8159:
URL: https://github.com/apache/hudi/issues/8159

   **Environment Description**
   
   * Hudi version :
   *0.13
   
   * Spark version :
   3.3.2
   
   * Hive version :
   apache-hive-4
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   Minio
   
   * Running on Docker? (yes/no) :
   no
   
   **Additional context**
   Job:
   [root@spark-hudi ~]# cat testeHudi_v3.sh
   #!/bin/bash
   spark-submit --name "hudi_sbr_ped_venda" --master spark://0.0.0.0:7077 --deploy-mode client --driver-memory 1G \
     --packages org.apache.hadoop:hadoop-aws:3.3.4 \
     --jars "/root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar,/root/hudi/spark-avro_2.13-3.3.2.jar" \
     --conf spark.executor.memory=2g --conf spark.cores.max=100 \
     --conf 'spark.hadoop.fs.s3a.access.key=admin' \
     --conf 'spark.hadoop.fs.s3a.secret.key=XXXXXXX'\
     --conf 'spark.hadoop.fs.s3a.endpoint=http://0.0.0.0:9000' \
     --conf 'spark.hadoop.fs.s3a.path.style.access=true' \
     --conf 'fs.s3a.signing-algorithm=S3SignerType' \
     --conf 'spark.sql.catalogImplementation=hive' \
     --conf 'spark.debug.maxToStringFields=500' \
     --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar \
     --table-type COPY_ON_WRITE --op UPSERT \
     --target-base-path s3a://hudi/sbr_ped_venda \
     --target-table sbr_ped_venda --continuous \
     --min-sync-interval-seconds 60 \
     --source-class org.apache.hudi.utilities.sources.debezium.PostgresDebeziumSource \
     --source-ordering-field _event_lsn \
     --payload-class org.apache.hudi.common.model.debezium.PostgresDebeziumAvroPayload \
     --hoodie-conf schema.registry.url=http://0.0.0.0:8081 \
     --hoodie-conf hoodie.deltastreamer.schemaprovider.registry.url=http://0.0.0.0:8081/subjects/pgprd.public.sbr_ped_venda-value/versions/latest \
     --hoodie-conf bootstrap.servers=0.0.0.0:9092 \
     --hoodie-conf hoodie.deltastreamer.source.kafka.value.deserializer.class=io.confluent.kafka.serializers.KafkaAvroDeserializer \
     --hoodie-conf hoodie.deltastreamer.source.kafka.topic=pgprd.public.sbr_ped_venda \
     --hoodie-conf hoodie.deltastreamer.source.kafka.group.id=datalake \
     --hoodie-conf auto.offset.reset=earliest \
     --hoodie-conf group.id=datalake \
     --hoodie-conf hoodie.datasource.write.recordkey.field=id \
     --hoodie-conf validate.non.null=false \
     --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator \
     --hoodie-conf hoodie.datasource.write.hive_style_partitioning=false
   
   I've tried several ways, this is the last test I did.
   
   ![image](https://user-images.githubusercontent.com/59495895/224552503-e7d3cd0f-0665-434c-bad7-8e223f9037be.png)
   
   
   **Stacktrace**
   [root@spark-hudi ~]# ./testeHudi_v3.sh
   Warning: Ignoring non-Spark config property: fs.s3a.signing-algorithm
   :: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
   Ivy Default Cache set to: /root/.ivy2/cache
   The jars for the packages stored in: /root/.ivy2/jars
   org.apache.hadoop#hadoop-aws added as a dependency
   :: resolving dependencies :: org.apache.spark#spark-submit-parent-5dcabf23-e567-4306-b4b1-acb277c8e07d;1.0
           confs: [default]
           found org.apache.hadoop#hadoop-aws;3.3.4 in central
           found com.amazonaws#aws-java-sdk-bundle;1.12.262 in central
           found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in central
   :: resolution report :: resolve 466ms :: artifacts dl 25ms
           :: modules in use:
           com.amazonaws#aws-java-sdk-bundle;1.12.262 from central in [default]
           org.apache.hadoop#hadoop-aws;3.3.4 from central in [default]
           org.wildfly.openssl#wildfly-openssl;1.0.7.Final from central in [default]
           ---------------------------------------------------------------------
           |                  |            modules            ||   artifacts   |
           |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
           ---------------------------------------------------------------------
           |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
           ---------------------------------------------------------------------
   :: retrieving :: org.apache.spark#spark-submit-parent-5dcabf23-e567-4306-b4b1-acb277c8e07d
           confs: [default]
           0 artifacts copied, 3 already retrieved (0kB/14ms)
   23/03/12 11:40:38 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   23/03/12 11:40:39 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
   23/03/12 11:40:39 INFO SparkContext: Running Spark version 3.3.2
   23/03/12 11:40:39 INFO ResourceUtils: ==============================================================
   23/03/12 11:40:39 INFO ResourceUtils: No custom resources configured for spark.driver.
   23/03/12 11:40:39 INFO ResourceUtils: ==============================================================
   23/03/12 11:40:39 INFO SparkContext: Submitted application: delta-streamer-sbr_ped_venda
   23/03/12 11:40:39 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 2048, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
   23/03/12 11:40:39 INFO ResourceProfile: Limiting resource is cpu
   23/03/12 11:40:39 INFO ResourceProfileManager: Added ResourceProfile id: 0
   23/03/12 11:40:39 INFO SecurityManager: Changing view acls to: root
   23/03/12 11:40:39 INFO SecurityManager: Changing modify acls to: root
   23/03/12 11:40:39 INFO SecurityManager: Changing view acls groups to:
   23/03/12 11:40:39 INFO SecurityManager: Changing modify acls groups to:
   23/03/12 11:40:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
   23/03/12 11:40:39 INFO deprecation: mapred.output.compression.codec is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.codec
   23/03/12 11:40:39 INFO deprecation: mapred.output.compression.type is deprecated. Instead, use mapreduce.output.fileoutputformat.compress.type
   23/03/12 11:40:39 INFO deprecation: mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
   23/03/12 11:40:40 INFO Utils: Successfully started service 'sparkDriver' on port 45603.
   23/03/12 11:40:40 INFO SparkEnv: Registering MapOutputTracker
   23/03/12 11:40:40 INFO SparkEnv: Registering BlockManagerMaster
   23/03/12 11:40:40 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
   23/03/12 11:40:40 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
   23/03/12 11:40:40 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
   23/03/12 11:40:40 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-8cfb4a38-de92-4ea7-881f-e49d4d4750ea
   23/03/12 11:40:40 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
   23/03/12 11:40:40 INFO SparkEnv: Registering OutputCommitCoordinator
   23/03/12 11:40:40 INFO Utils: Successfully started service 'SparkUI' on port 8090.
   23/03/12 11:40:40 INFO SparkContext: Added JAR file:///root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar at spark://spark-hudi.smartbr.com:45603/jars/hudi-utilities-bundle_2.12-0.13.0.jar with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR file:///root/hudi/spark-avro_2.13-3.3.2.jar at spark://spark-hudi.smartbr.com:45603/jars/spark-avro_2.13-3.3.2.jar with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.apache.hadoop_hadoop-aws-3.3.4.jar at spark://spark-hudi.smartbr.com:45603/jars/org.apache.hadoop_hadoop-aws-3.3.4.jar with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR file:///root/.ivy2/jars/com.amazonaws_aws-java-sdk-bundle-1.12.262.jar at spark://spark-hudi.smartbr.com:45603/jars/com.amazonaws_aws-java-sdk-bundle-1.12.262.jar with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.wildfly.openssl_wildfly-openssl-1.0.7.Final.jar at spark://spark-hudi.smartbr.com:45603/jars/org.wildfly.openssl_wildfly-openssl-1.0.7.Final.jar with timestamp 1678632039164
   23/03/12 11:40:40 INFO SparkContext: The JAR file:/root/hudi/hudi-utilities-bundle_2.12-0.13.0.jar at spark://spark-hudi.smartbr.com:45603/jars/hudi-utilities-bundle_2.12-0.13.0.jar has been added already. Overwriting of added jar is not supported in the current version.
   23/03/12 11:40:40 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://0.0.0.0:7077...
   23/03/12 11:40:41 INFO TransportClientFactory: Successfully created connection to /0.0.0.0.0:7077 after 50 ms (0 ms spent in bootstraps)
   23/03/12 11:40:41 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230312114041-0014
   23/03/12 11:40:41 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230312114041-0014/0 on worker-20230312102039-0.0.0.0-40923 (192.168.200.15:40923) with 2 core(s)
   23/03/12 11:40:41 INFO StandaloneSchedulerBackend: Granted executor ID app-20230312114041-0014/0 on hostPort 0.0.0.0:40923 with 2 core(s), 2.0 GiB RAM
   23/03/12 11:40:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44591.
   23/03/12 11:40:41 INFO NettyBlockTransferService: Server created on spark-hudi.smartbr.com:44591
   23/03/12 11:40:41 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
   23/03/12 11:40:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-hudi.smartbr.com, 44591, None)
   23/03/12 11:40:41 INFO BlockManagerMasterEndpoint: Registering block manager spark-hudi:44591 with 434.4 MiB RAM, BlockManagerId(driver, spark-hudi, 44591, None)
   23/03/12 11:40:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-hudi, 44591, None)
   23/03/12 11:40:41 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-hudi, 44591, None)
   23/03/12 11:40:41 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230312114041-0014/0 is now RUNNING
   23/03/12 11:40:41 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
   23/03/12 11:40:42 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
   23/03/12 11:40:42 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
   23/03/12 11:40:42 INFO MetricsSystemImpl: s3a-file-system metrics system started
   23/03/12 11:40:44 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   23/03/12 11:40:44 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   23/03/12 11:40:44 INFO UtilHelpers: Adding overridden properties to file properties.
   23/03/12 11:40:44 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
   23/03/12 11:40:45 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:45 INFO HoodieTableConfig: Loading table properties from s3a://hudi/sbr_ped_venda/.hoodie/hoodie.properties
   23/03/12 11:40:45 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:45 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
   23/03/12 11:40:45 INFO SharedState: Warehouse path is 'file:/root/spark-warehouse'.
   23/03/12 11:40:48 INFO HoodieDeltaStreamer: Creating delta streamer with configs:
   auto.offset.reset: earliest
   bootstrap.servers: 0.0.0.0:9092
   group.id: datalake
   hoodie.auto.adjust.lock.configs: true
   hoodie.datasource.write.hive_style_partitioning: false
   hoodie.datasource.write.keygenerator.class: org.apache.hudi.keygen.NonpartitionedKeyGenerator
   hoodie.datasource.write.reconcile.schema: false
   hoodie.datasource.write.recordkey.field: id
   hoodie.deltastreamer.schemaprovider.registry.url: http://0.0.0.0:8081/subjects/pgprd.public.sbr_ped_venda-value/versions/latest
   hoodie.deltastreamer.source.kafka.group.id: datalake
   hoodie.deltastreamer.source.kafka.topic: pgprd.public.sbr_ped_venda
   hoodie.deltastreamer.source.kafka.value.deserializer.class: io.confluent.kafka.serializers.KafkaAvroDeserializer
   schema.registry.url: http://0.0.0.0.:8081
   validate.non.null: false
   
   23/03/12 11:40:48 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:48 INFO HoodieTableConfig: Loading table properties from s3a://hudi/sbr_ped_venda/.hoodie/hoodie.properties
   23/03/12 11:40:48 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:48 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
   23/03/12 11:40:49 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:49 INFO HoodieTableConfig: Loading table properties from s3a://hudi/sbr_ped_venda/.hoodie/hoodie.properties
   23/03/12 11:40:49 INFO HoodieTableMetaClient: Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from s3a://hudi/sbr_ped_venda
   23/03/12 11:40:49 INFO HoodieActiveTimeline: Loaded instants upto : Optional.empty
   23/03/12 11:40:49 INFO DeltaSync: Checkpoint to resume from : Optional.empty
   23/03/12 11:40:49 INFO ConsumerConfig: ConsumerConfig values:
           allow.auto.create.topics = true
           auto.commit.interval.ms = 5000
           auto.offset.reset = earliest
           bootstrap.servers = [0.0.0.0:9092]
           check.crcs = true
           client.dns.lookup = default
           client.id =
           client.rack =
           connections.max.idle.ms = 540000
           default.api.timeout.ms = 60000
           enable.auto.commit = true
           exclude.internal.topics = true
           fetch.max.bytes = 52428800
           fetch.max.wait.ms = 500
           fetch.min.bytes = 1
           group.id = datalake
           group.instance.id = null
           heartbeat.interval.ms = 3000
           interceptor.classes = []
           internal.leave.group.on.close = true
           isolation.level = read_uncommitted
           key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
           max.partition.fetch.bytes = 1048576
           max.poll.interval.ms = 300000
           max.poll.records = 500
           metadata.max.age.ms = 300000
           metric.reporters = []
           metrics.num.samples = 2
           metrics.recording.level = INFO
           metrics.sample.window.ms = 30000
           partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
           receive.buffer.bytes = 65536
           reconnect.backoff.max.ms = 1000
           reconnect.backoff.ms = 50
           request.timeout.ms = 30000
           retry.backoff.ms = 100
           sasl.client.callback.handler.class = null
           sasl.jaas.config = null
           sasl.kerberos.kinit.cmd = /usr/bin/kinit
           sasl.kerberos.min.time.before.relogin = 60000
           sasl.kerberos.service.name = null
           sasl.kerberos.ticket.renew.jitter = 0.05
           sasl.kerberos.ticket.renew.window.factor = 0.8
           sasl.login.callback.handler.class = null
           sasl.login.class = null
           sasl.login.refresh.buffer.seconds = 300
           sasl.login.refresh.min.period.seconds = 60
           sasl.login.refresh.window.factor = 0.8
           sasl.login.refresh.window.jitter = 0.05
           sasl.mechanism = GSSAPI
           security.protocol = PLAINTEXT
           security.providers = null
           send.buffer.bytes = 131072
           session.timeout.ms = 10000
           ssl.cipher.suites = null
           ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
           ssl.endpoint.identification.algorithm = https
           ssl.key.password = null
           ssl.keymanager.algorithm = SunX509
           ssl.keystore.location = null
           ssl.keystore.password = null
           ssl.keystore.type = JKS
           ssl.protocol = TLS
           ssl.provider = null
           ssl.secure.random.implementation = null
           ssl.trustmanager.algorithm = PKIX
           ssl.truststore.location = null
           ssl.truststore.password = null
           ssl.truststore.type = JKS
           value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
   
   23/03/12 11:40:49 INFO KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
           bearer.auth.token = [hidden]
           schema.registry.url = [http://0.0.0.0:8081]
           basic.auth.user.info = [hidden]
           auto.register.schemas = true
           max.schemas.per.subject = 1000
           basic.auth.credentials.source = URL
           schema.registry.basic.auth.user.info = [hidden]
           bearer.auth.credentials.source = STATIC_TOKEN
           specific.avro.reader = false
           value.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
           key.subject.name.strategy = class io.confluent.kafka.serializers.subject.TopicNameStrategy
   
   23/03/12 11:40:49 WARN ConsumerConfig: The configuration 'validate.non.null' was supplied but isn't a known config.
   23/03/12 11:40:49 WARN ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.value.deserializer.class' was supplied but isn't a known config.
   23/03/12 11:40:49 INFO AppInfoParser: Kafka version: 2.4.1
   23/03/12 11:40:49 INFO AppInfoParser: Kafka commitId: c57222ae8cd7866b
   23/03/12 11:40:49 INFO AppInfoParser: Kafka startTimeMs: 1678632049794
   23/03/12 11:40:50 INFO Metadata: [Consumer clientId=consumer-datalake-1, groupId=datalake] Cluster ID: 0EpmDQKcTSK7kGDKCviAwQ
   23/03/12 11:40:50 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.200.15:56696) with ID 0,  ResourceProfileId 0
   23/03/12 11:40:50 INFO KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000
   23/03/12 11:40:50 INFO DebeziumSource: About to read 1532804 from Kafka for topic :pgprd.public.sbr_ped_venda
   23/03/12 11:40:51 WARN KafkaUtils: overriding enable.auto.commit to false for executor
   23/03/12 11:40:51 WARN KafkaUtils: overriding auto.offset.reset to none for executor
   23/03/12 11:40:51 WARN KafkaUtils: overriding executor group.id to spark-executor-datalake
   23/03/12 11:40:51 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 see KAFKA-3135
   23/03/12 11:40:51 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.200.15:33183 with 1048.8 MiB RAM, BlockManagerId(0, 0.0.0.0, 33183, None)
   23/03/12 11:40:52 INFO SparkContext: Starting job: isEmpty at AvroConversionUtils.scala:120
   23/03/12 11:40:52 INFO DAGScheduler: Got job 0 (isEmpty at AvroConversionUtils.scala:120) with 1 output partitions
   23/03/12 11:40:52 INFO DAGScheduler: Final stage: ResultStage 0 (isEmpty at AvroConversionUtils.scala:120)
   23/03/12 11:40:52 INFO DAGScheduler: Parents of final stage: List()
   23/03/12 11:40:52 INFO DAGScheduler: Missing parents: List()
   23/03/12 11:40:52 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at DebeziumSource.java:159), which has no missing parents
   23/03/12 11:40:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.0 KiB, free 434.4 MiB)
   23/03/12 11:40:53 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.5 KiB, free 434.4 MiB)
   23/03/12 11:40:53 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-hudi.smartbr.com:44591 (size: 3.5 KiB, free: 434.4 MiB)
   23/03/12 11:40:53 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1513
   23/03/12 11:40:53 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at DebeziumSource.java:159) (first 15 tasks are for partitions Vector(0))
   23/03/12 11:40:53 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
   23/03/12 11:40:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (0.0.0.0, executor 0, partition 0, PROCESS_LOCAL, 4378 bytes) taskResourceAssignments Map()
   23/03/12 11:40:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:33183 (size: 3.5 KiB, free: 1048.8 MiB)
   23/03/12 11:41:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3868 ms on 0.0.0.0 (executor 0) (1/1)
   23/03/12 11:41:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
   23/03/12 11:41:02 INFO DAGScheduler: ResultStage 0 (isEmpty at AvroConversionUtils.scala:120) finished in 9.974 s
   23/03/12 11:41:02 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
   23/03/12 11:41:02 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
   23/03/12 11:41:02 INFO DAGScheduler: Job 0 finished: isEmpty at AvroConversionUtils.scala:120, took 10.185954 s
   23/03/12 11:41:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 0.0.0.0:33183 in memory (size: 3.5 KiB, free: 1048.8 MiB)
   23/03/12 11:41:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on spark-hudi.smartbr.com:44591 in memory (size: 3.5 KiB, free: 434.4 MiB)
   23/03/12 11:41:05 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
   **23/03/12 11:41:07 INFO DebeziumSource: Date fields: []**
   23/03/12 11:41:07 INFO HoodieDeltaStreamer: Delta Sync shutdown. Error ?false
   23/03/12 11:41:07 INFO HoodieDeltaStreamer: DeltaSync shutdown. Closing write client. Error?true
   23/03/12 11:41:07 ERROR HoodieAsyncService: Service shutdown with error
   java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
           at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
           at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:195)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:192)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:42)
           at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:40)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
           at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
           at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
           at scala.collection.immutable.List.foldLeft(List.scala:91)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
           at scala.collection.immutable.List.foreach(List.scala:431)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
           at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
           at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
           at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
           at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
           at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
           at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
           at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
           at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
           at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
           at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:173)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:172)
           at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3396)
           at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3394)
           at org.apache.hudi.utilities.sources.debezium.DebeziumSource.convertColumnToNullable(DebeziumSource.java:220)
           at org.apache.hudi.utilities.sources.debezium.DebeziumSource.toDataset(DebeziumSource.java:167)
           at org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:123)
           at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
           at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
           at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:71)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:530)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:716)
           at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   23/03/12 11:41:07 INFO DeltaSync: Shutting down embedded timeline server
   23/03/12 11:41:07 INFO SparkUI: Stopped Spark web UI at http://spark-hudi:8090
   23/03/12 11:41:07 INFO StandaloneSchedulerBackend: Shutting down all executors
   23/03/12 11:41:07 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
   23/03/12 11:41:08 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   23/03/12 11:41:08 INFO MemoryStore: MemoryStore cleared
   23/03/12 11:41:08 INFO BlockManager: BlockManager stopped
   23/03/12 11:41:08 INFO BlockManagerMaster: BlockManagerMaster stopped
   23/03/12 11:41:08 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   23/03/12 11:41:08 INFO SparkContext: Successfully stopped SparkContext
   Exception in thread "main" org.apache.hudi.exception.HoodieException: java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:197)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:192)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
           at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
           at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:195)
           ... 15 more
   Caused by: java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.plans.logical.LogicalPlan org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(scala.PartialFunction)'
           at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:42)
           at org.apache.spark.sql.hudi.analysis.HoodiePruneFileSourcePartitions.apply(HoodiePruneFileSourcePartitions.scala:40)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
           at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
           at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
           at scala.collection.immutable.List.foldLeft(List.scala:91)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
           at scala.collection.immutable.List.foreach(List.scala:431)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
           at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
           at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:126)
           at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
           at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
           at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
           at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
           at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
           at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
           at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:122)
           at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:118)
           at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:136)
           at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:154)
           at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:151)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:173)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:172)
           at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3396)
           at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3394)
           at org.apache.hudi.utilities.sources.debezium.DebeziumSource.convertColumnToNullable(DebeziumSource.java:220)
           at org.apache.hudi.utilities.sources.debezium.DebeziumSource.toDataset(DebeziumSource.java:167)
           at org.apache.hudi.utilities.sources.debezium.DebeziumSource.fetchNextBatch(DebeziumSource.java:123)
           at org.apache.hudi.utilities.sources.RowSource.fetchNewData(RowSource.java:43)
           at org.apache.hudi.utilities.sources.Source.fetchNext(Source.java:76)
           at org.apache.hudi.utilities.deltastreamer.SourceFormatAdapter.fetchNewDataInAvroFormat(SourceFormatAdapter.java:71)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:530)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:716)
           at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
           at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
           at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
           at java.base/java.lang.Thread.run(Thread.java:829)
   23/03/12 11:41:08 INFO ShutdownHookManager: Shutdown hook called
   23/03/12 11:41:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-5dccedc5-681f-446c-987b-0e989840983e
   23/03/12 11:41:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-c05b2c1a-a206-4c9d-9e2b-c4f414f7f03b
   23/03/12 11:41:08 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system...
   23/03/12 11:41:08 INFO MetricsSystemImpl: s3a-file-system metrics system stopped.
   23/03/12 11:41:08 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8159: [SUPPORT] - Debezium PostgreSQL

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8159:
URL: https://github.com/apache/hudi/issues/8159#issuecomment-1558604508

   @lenhardtx Gentle ping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope closed issue #8159: [SUPPORT] - Debezium PostgreSQL

Posted by "codope (via GitHub)" <gi...@apache.org>.

codope closed issue #8159: [SUPPORT] - Debezium PostgreSQL
URL: https://github.com/apache/hudi/issues/8159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8159: [SUPPORT] - Debezium PostgreSQL

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8159:
URL: https://github.com/apache/hudi/issues/8159#issuecomment-1532914763

   @lenhardtx There is known in-compatibility for spark 3.3.2 and Hudi 0.13.0: https://github.com/apache/hudi/pull/8082, can you try the patch to see if it resolves your problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8159: [SUPPORT] - Debezium PostgreSQL

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8159:
URL: https://github.com/apache/hudi/issues/8159#issuecomment-1539434631

   @lenhardtx Did you got a chance to test out with the patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8159: [SUPPORT] - Debezium PostgreSQL

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8159:
URL: https://github.com/apache/hudi/issues/8159#issuecomment-1621296969

   @lenhardtx Thanks. Closing out the issue then. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] lenhardtx commented on issue #8159: [SUPPORT] - Debezium PostgreSQL

Posted by "lenhardtx (via GitHub)" <gi...@apache.org>.

lenhardtx commented on issue #8159:
URL: https://github.com/apache/hudi/issues/8159#issuecomment-1561851001

   Sorry for late, I was passing the jars wrong, its works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org