You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/20 04:37:13 UTC

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

nsivabalan commented on code in PR #6632:
URL: https://github.com/apache/hudi/pull/6632#discussion_r1000134094


##########
hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java:
##########
@@ -202,22 +199,19 @@ public R get(Object key) {
 
   @Override
   public R put(T key, R value) {
+    if (this.currentInMemoryMapSize >= maxInMemorySizeInBytes || inMemoryMap.size() % NUMBER_OF_RECORDS_TO_ESTIMATE_PAYLOAD_SIZE == 0) {

Review Comment:
   I am not sure how this refactoring results in more accurate estimation of size. can you help me understand please. 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -521,11 +522,16 @@ private void writeToBuffer(HoodieRecord<T> record) {
    * Checks if the number of records have reached the set threshold and then flushes the records to disk.
    */
   private void flushToDiskIfRequired(HoodieRecord record) {
+    if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize) 
+        || numberOfRecords % NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE == 0) {
+      averageRecordSize = (long) (averageRecordSize * 0.8 + sizeEstimator.sizeEstimate(record) * 0.2);
+    }
+
     // Append if max number of records reached to achieve block size
     if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize)) {
       // Recompute averageRecordSize before writing a new block and update existing value with
       // avg of new and old
-      LOG.info("AvgRecordSize => " + averageRecordSize);
+      LOG.info("Flush log block to disk, the current avgRecordSize => " + averageRecordSize);

Review Comment:
   with L527, isn't L535 redundant? we can remove that in my understanding. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org