You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/03/03 18:17:59 UTC

[GitHub] [hudi] nsivabalan commented on issue #8016: Inline Clustering : Clustering failed to write to files

nsivabalan commented on issue #8016:
URL: https://github.com/apache/hudi/issues/8016#issuecomment-1453917024

   Please check out these properties. 
   
   Max num groups:
   
   hoodie.clustering.plan.strategy.max.num.groups: Maximum number of groups to create as part of ClusteringPlan. Increasing groups will increase parallelism. This does not imply the number of output file groups as such. This refers to clustering groups (parallel tasks/threads that will work towards producing output file groups). Total output file groups is also determined by based on target file size which we will discuss shortly.
   
   Max bytes per group:
   
   hoodie.clustering.plan.strategy.max.bytes.per.group: Each clustering operation can create multiple output file groups. Total amount of data processed by clustering operation is defined by below two properties (Max bytes per group * Max num groups. Thus, this config will assist in capping the max amount of data to be included in one group.
   
   Target file size max:
   
   hoodie.clustering.plan.strategy.target.file.max.bytes: Each group can produce ā€™Nā€™ (max group size /target file size) output file groups.
   
   
   These might help trim down the amount of data to be considered for clustering. May be we are trying to cluster too many files at the same time. 
   
   Reference: https://medium.com/@simpsons/storage-optimization-with-apache-hudi-clustering-aa6e23e18e77
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org