You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/14 16:53:28 UTC

[GitHub] [hudi] ksrihari93 commented on issue #5822: Hudi Clustering not working

ksrihari93 commented on issue #5822:
URL: https://github.com/apache/hudi/issues/5822#issuecomment-1155454113

   
   
   
   
   > may be this could be the issue. can you try adding this to spark-submit command
   > 
   > ```
   > --hoodie-conf hoodie.clustering.async.enabled=true
   > ```
   
   Hi ,
   I have passed this in the source.properties file 
   
   and 
   
   tried this as well
   hoodie.clustering.plan.strategy.max.bytes.per.group
   
   below props are  passed
   
   hoodie.insert.shuffle.parallelism=50
   hoodie.bulkinsert.shuffle.parallelism=200
   hoodie.embed.timeline.server=true
   hoodie.filesystem.view.type=EMBEDDED_KV_STORE
   hoodie.compact.inline=false
   hoodie.bulkinsert.sort.mode=none
   
   #cleaner properties
   hoodie.cleaner.policy=KEEP_LATEST_FILE_VERSIONS
   hoodie.cleaner.fileversions.retained=60
   hoodie.clean.async=true
   
   #archival
   hoodie.keep.min.commits=12
   hoodie.keep.max.commits=15
   
   #datasource properties
   hoodie.deltastreamer.schemaprovider.registry.url=
   hoodie.datasource.write.recordkey.field=
   hoodie.deltastreamer.source.kafka.topic=
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
   hoodie.datasource.write.partitionpath.field=timestamp:TIMESTAMP
   hoodie.deltastreamer.kafka.source.maxEvents=600000000
   hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS
   hoodie.deltastreamer.keygen.timebased.input.timezone=UTC
   hoodie.deltastreamer.keygen.timebased.output.timezone=UTC
   hoodie.deltastreamer.keygen.timebased.output.dateformat='dt='yyyy-MM-dd
   hoodie.clustering.async.enabled=true
   hoodie.clustering.plan.strategy.target.file.max.bytes=3000000000
   hoodie.clustering.plan.strategy.small.file.limit=200000001
   hoodie.clustering.async.max.commits=1
   hoodie.clustering.plan.strategy.max.num.groups=10
   oodie.clustering.plan.strategy.max.bytes.per.group=9000000000
   
   #kafka props
   bootstrap.servers=
   group.id=hudi-lpe
   auto.offset.reset=(As is said above when i passed earliest it got failed) ,so no other choice to recover i have passed latest
   hoodie.deltastreamer.source.kafka.checkpoint.type=timestamp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org