You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/14 16:53:28 UTC
[GitHub] [hudi] ksrihari93 commented on issue #5822: Hudi Clustering not working
ksrihari93 commented on issue #5822:
URL: https://github.com/apache/hudi/issues/5822#issuecomment-1155454113
> may be this could be the issue. can you try adding this to spark-submit command
>
> ```
> --hoodie-conf hoodie.clustering.async.enabled=true
> ```
Hi ,
I have passed this in the source.properties file
and
tried this as well
hoodie.clustering.plan.strategy.max.bytes.per.group
below props are passed
hoodie.insert.shuffle.parallelism=50
hoodie.bulkinsert.shuffle.parallelism=200
hoodie.embed.timeline.server=true
hoodie.filesystem.view.type=EMBEDDED_KV_STORE
hoodie.compact.inline=false
hoodie.bulkinsert.sort.mode=none
#cleaner properties
hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS
hoodie.cleaner.fileversions.retained=60
hoodie.clean.async=true
#archival
hoodie.keep.min.commits=12
hoodie.keep.max.commits=15
#datasource properties
hoodie.deltastreamer.schemaprovider.registry.url=
hoodie.datasource.write.recordkey.field=
hoodie.deltastreamer.source.kafka.topic=
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
hoodie.datasource.write.partitionpath.field=timestamp:TIMESTAMP
hoodie.deltastreamer.kafka.source.maxEvents=600000000
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
hoodie.deltastreamer.keygen.timebased.input.timezone=UTC
hoodie.deltastreamer.keygen.timebased.output.timezone=UTC
hoodie.deltastreamer.keygen.timebased.output.dateformat='dt='yyyy-MM-dd
hoodie.clustering.async.enabled=true
hoodie.clustering.plan.strategy.target.file.max.bytes=3000000000
hoodie.clustering.plan.strategy.small.file.limit=200000001
hoodie.clustering.async.max.commits=1
hoodie.clustering.plan.strategy.max.num.groups=10
oodie.clustering.plan.strategy.max.bytes.per.group=9000000000
#kafka props
bootstrap.servers=
group.id=hudi-lpe
auto.offset.reset=(As is said above when i passed earliest it got failed) ,so no other choice to recover i have passed latest
hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org