You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/11/24 01:33:09 UTC
[GitHub] [hudi] giaosudau edited a comment on pull request #2208: [HUDI-1040] Make Hudi support Spark 3
giaosudau edited a comment on pull request #2208:
URL: https://github.com/apache/hudi/pull/2208#issuecomment-732523090
I tried to run deltastreamer with sqltransformer
Hi everyone,
I am running spark3 https://github.com/apache/hudi/pull/2208
with deltastreamer and sqltranformer for debezium data
```
spark-submit \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
--driver-memory 2g \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.hive.convertMetastoreParquet=false \
--packages org.apache.spark:spark-avro_2.12:3.0.1 \
~/workspace/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.6.1-SNAPSHOT.jar \
--table-type MERGE_ON_READ \
--source-ordering-field ts_ms \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--target-base-path /Users/users/Downloads/roi/debezium/by_test/ \
--target-table users \
--props ./hudi_base.properties \
--transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
hoodie.upsert.shuffle.parallelism=2
hoodie.insert.shuffle.parallelism=2
hoodie.bulkinsert.shuffle.parallelism=2
# Key fields, for kafka example
hoodie.datasource.write.storage.type=MERGE_ON_READ
hoodie.datasource.write.recordkey.field=id
hoodie.datasource.write.partitionpath.field=ts_ms
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
# schema provider configs
hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/dbz1.by_test.users-value/versions/latest
#Kafka props
hoodie.deltastreamer.source.kafka.topic=dbz1.by_test.users
metadata.broker.list=localhost:9092
bootstrap.servers=localhost:9092
auto.offset.reset=earliest
schema.registry.url=http://localhost:8081
hoodie.deltastreamer.transformer.sql=SELECT ts_ms, op, after.* FROM <SRC> WHERE op IN ('u', 'c')
```
```
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000010f4cbad0, pid=33960, tid=0x0000000000013e03
#
# JRE version: OpenJDK Runtime Environment (8.0_265-b01) (build 1.8.0_265-b01)
# Java VM: OpenJDK 64-Bit Server VM (25.265-b01 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# V [libjvm.dylib+0xcbad0]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org