You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "helpta (via GitHub)" <gi...@apache.org> on 2023/02/20 04:44:40 UTC

[GitHub] [hudi] helpta commented on a diff in pull request #7255: [HUDI-5250] use the estimate record size when estimation threshold is l…

helpta commented on code in PR #7255:
URL: https://github.com/apache/hudi/pull/7255#discussion_r1111450040


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java:
##########
@@ -372,7 +372,7 @@ protected static long averageBytesPerRecord(HoodieTimeline commitTimeline, Hoodi
     long avgSize = hoodieWriteConfig.getCopyOnWriteRecordSizeEstimate();
     long fileSizeThreshold = (long) (hoodieWriteConfig.getRecordSizeEstimationThreshold() * hoodieWriteConfig.getParquetSmallFileLimit());
     try {
-      if (!commitTimeline.empty()) {
+      if (hoodieWriteConfig.getRecordSizeEstimationThreshold() > 0 && !commitTimeline.empty()) {
         // Go over the reverse ordered commits to get a more recent estimate of average record size.
         Iterator<HoodieInstant> instants = commitTimeline.getReverseOrderedInstants().iterator();

Review Comment:
   >  if (hoodieWriteConfig.getRecordSizeEstimationThreshold() > 0 && !commitTimeline.empty()) 
   
   Shouldn't we first determine if the default value is adjusted (org.apache.hudi.config.HoodieCompactionConfig#_COPY_ON_WRITE_RECORD_SIZE_ESTIMATE)?  I think this is the first priority. 
   
    Imagine, according to the logic adjusted above, that is, you can only set avgSize to a fixed 1024 (the default size) by adjusting the threshold, but not the ability to let users customize avgSize according to their personalized tasks. 
   
   If I have misunderstood, please let me know ,thanks
   
   @danny0405 @honeyaya @codope @nsivabalan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org