You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2023/01/10 15:36:04 UTC

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7372: [HUDI-5326] Fix clustering group building in SparkSizeBasedClusteringPlanStrategy

Zouxxyy commented on code in PR #7372:
URL: https://github.com/apache/hudi/pull/7372#discussion_r1065933284


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestLayoutOptimization.scala:
##########
@@ -111,8 +111,7 @@ class TestLayoutOptimization extends HoodieClientTestBase {
       .option("hoodie.clustering.inline.max.commits", "1")
       .option("hoodie.clustering.plan.strategy.target.file.max.bytes", "1073741824")
       .option("hoodie.clustering.plan.strategy.small.file.limit", "629145600")
-      .option("hoodie.clustering.plan.strategy.max.bytes.per.group", Long.MaxValue.toString)

Review Comment:
   Let me explain the modification here: since we set the maximum size of the file written by clustering to `hoodie.clustering.plan.strategy.max.bytes.per.group`, if the value is Long.MaxValue.toString, after add by compressing ratio, it will be become negative, at this point the file will be split infinitely
   ![image](https://user-images.githubusercontent.com/37108074/211593944-7060014a-0792-4760-88cc-b6e46aa29f76.png)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org