You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/24 06:52:45 UTC

[GitHub] [hudi] danny0405 commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

danny0405 commented on code in PR #6632:
URL: https://github.com/apache/hudi/pull/6632#discussion_r1002932637


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -521,11 +522,16 @@ private void writeToBuffer(HoodieRecord<T> record) {
    * Checks if the number of records have reached the set threshold and then flushes the records to disk.
    */
   private void flushToDiskIfRequired(HoodieRecord record) {
+    if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize) 
+        || numberOfRecords % NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE == 0) {
+      averageRecordSize = (long) (averageRecordSize * 0.8 + sizeEstimator.sizeEstimate(record) * 0.2);
+    }
+
     // Append if max number of records reached to achieve block size
     if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize)) {
       // Recompute averageRecordSize before writing a new block and update existing value with
       // avg of new and old
-      LOG.info("AvgRecordSize => " + averageRecordSize);
+      LOG.info("Flush log block to disk, the current avgRecordSize => " + averageRecordSize);

Review Comment:
   What's the problem here if we only estimate the record size on flushing ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org