You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "clownxc (via GitHub)" <gi...@apache.org> on 2023/04/16 03:51:00 UTC

[GitHub] [hudi] clownxc opened a new pull request, #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

clownxc opened a new pull request, #8472:
URL: https://github.com/apache/hudi/pull/8472

   ### Change Logs
   
   WriteStatus stores the entire HoodieRecord. we can optimize it to store just the required info (record key, partition path, location). 
   ### Impact
   
   Optimize `WriteStatus` to store just the required info (record key, partition path, location). 
   ### Risk level (write none, low medium or high below)
   
   low
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536940199

   > @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed (boolean) and data (t). Are the savings due to usage of the mock class (which may have bloating compared to the original HoodieRecord)?
   > 
   > But hoodie write handles [deflate the HoodieRecord ](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after writing so the data portion should go away reducing the amount of savings possible.
   > 
   > Can you run the test again with these changes:
   > 
   > 1. WriteStatus status = new WriteStatus(true, 1.0);   // enable success record tracking as errors should be rare
   > 2. Create an actual HoodieRecord and use that in the for loop instead of the mock(HoodieRecord.class)
   > 3. Call deflate on the create HoodieRecord to remove the data as the write handles do.
   > 
   > I feel the above may give a more realistic view of savings.
   > 
   > Also, how did you find this interesting optimization? I am interested as there may be other avenues of such savings within HUDI so if would be good to know how you track these.
   
   this interesting optimization was reported by @nsivabalan and has not been implemented for a long time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536673539

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871",
       "triggerID" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1510053296

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 402e4a78e4f37f7e587a23855f9042363dd70368 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518674030

   > It is great if we can have numbers to illustrate the gains after the patch, like the cost reduction for memory or something.
   
   The memory occupied by WriteStatus after optimization is about 1/300 of that before optimization !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536408666

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871",
       "triggerID" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563) 
   * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537272950

   According to the suggestion provided by @prashantwason , I did a test as follows:
   ```java
       WriteStatus status = new WriteStatus(true, 1.0);
       String partitionPath = HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[0];
       dataGen = new HoodieTestDataGenerator(new String[] {partitionPath});
       String newCommitTime = "001";
       List<HoodieRecord> records = dataGen.generateInserts(newCommitTime, 1000);
       Throwable t = new Exception("some error in writing");
       for (int i = 0; i < 1000 ; i++) {
         HoodieRecord data1 = records.get(i);
         status.markSuccess(data1, Option.empty());
         data1.deflate();
         HoodieRecord data2 = records.get(i++);
         status.markFailure(data2, t, Option.empty());
         data2.deflate();
       }
       System.out.println("status memory: " + ObjectSizeCalculator.getObjectSize(status));
   ```
   
   
   It was found that the memory space occupation before(status memory: 113048) and after optimization(status memory: 117032) basically did not change, The main reason is that `hoodie write handles deflate the HoodieRecord after writing` and `the mock class which may have bloating`  (I'm sorry because I didn't take these two factors into account in the previous test)
   @prashantwason @danny0405 @vinothchandar 
   
   I have a doubt that if there is some optimization needed for `writeStatus.markFailure`  if an exception occurs before `record.deflate()` 
   
   ```java
         writeStatus.markSuccess(hoodieRecord, recordMetadata);
         // deflate record payload after recording success. This will help users access payload as a
         // part of marking
         // record successful.
         hoodieRecord.deflate();
         return finalRecordOpt;
       } catch (Exception e) {
         LOG.error("Error writing record  " + hoodieRecord, e);
         writeStatus.markFailure(hoodieRecord, e, recordMetadata);
       }
   ```
   or, In some places, there will be no `deflate` operation when `writeStatus.markFailure` 
   ```java
       if (indexedRecord.isPresent()) {
         // Skip the ignored record.
         try {
           if (!indexedRecord.get().shouldIgnore(writeSchema, recordProperties)) {
             recordList.add(indexedRecord.get());
           }
         } catch (IOException e) {
           writeStatus.markFailure(record, e, record.getMetadata());
           LOG.error("Error writing record  " + indexedRecord.get(), e);
         }
       }
   ```
   
   Although the optimized effect may not have a large benefit
   ```java
     public void markFailure(HoodieRecord record, Throwable t, Option<Map<String, String>> optionalRecordMetadata) {
       if (failedRecords.isEmpty() || (random.nextDouble() <= failureFraction)) {
         // Guaranteed to have at-least one error
         failedRecords.add(record);
         errors.put(record.getKey(), t);
       }
       totalRecords++;
       totalErrorRecords++;
     }
   ```
   
   hope you leave some comments in your free time. @prashantwason @danny0405 @vinothchandar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1512189863

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9da1c0da2753e7be3b6612568cc6750ba9944403 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518669128

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9da1c0da2753e7be3b6612568cc6750ba9944403 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402) 
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1186566954


##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {
+
+
+  /**
+   * Identifies the record across the table.
+   */
+  protected HoodieKey key;
+

Review Comment:
   > Can we make all these members private and final?
   
   Thank you very much for review and sorry for the late response. I will try to modify the code according to your suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bvaradar commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "bvaradar (via GitHub)" <gi...@apache.org>.
bvaradar commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537278449

   @clownxc : For failed records, we need to have them logged elsewhere and so no need to deflate. For exception cases, the write status should be marked as failure. So, I don't see any reason to change this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567008


##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {

Review Comment:
   > Give some doc to the class.
   
   Thank you very much for review and sorry for the late response. I will try to modify the code according to your suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537368074

   > @clownxc : For failed records, we need to have them logged elsewhere and so no need to deflate. For exception cases, the write status should be marked as failure. So, I don't see any reason to change this.
   
   I see, Thank you very much for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1510114805

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 402e4a78e4f37f7e587a23855f9042363dd70368 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1511403545

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 402e4a78e4f37f7e587a23855f9042363dd70368 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375) 
   * 9da1c0da2753e7be3b6612568cc6750ba9944403 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518681224

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 6147cc8922856a286a71fa73140c7a09634a7e9a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536932897

   
   
   
   > @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed (boolean) and data (t). Are the savings due to usage of the mock class (which may have bloating compared to the original HoodieRecord)?
   > 
   > But hoodie write handles [deflate the HoodieRecord ](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after writing so the data portion should go away reducing the amount of savings possible.
   > 
   > Can you run the test again with these changes:
   > 
   > 1. WriteStatus status = new WriteStatus(true, 1.0);   // enable success record tracking as errors should be rare
   > 2. Create an actual HoodieRecord and use that in the for loop instead of the mock(HoodieRecord.class)
   > 3. Call deflate on the create HoodieRecord to remove the data as the write handles do.
   > 
   > I feel the above may give a more realistic view of savings.
   > 
   > Also, how did you find this interesting optimization? I am interested as there may be other avenues of such savings within HUDI so if would be good to know how you track these.
   
   I would be happy to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536939863

   > this interesting optimization
   
   this interesting  optimization was reported by @nsivabalan and has not been implemented for a long time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518748097

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1512389020

   It is great if we can have numbers to illustrate the gains after the patch, like the cost reduction for memory or something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537265812

   > @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed (boolean) and data (t). Are the savings due to usage of the mock class (which may have bloating compared to the original HoodieRecord)?
   > 
   > But hoodie write handles [deflate the HoodieRecord ](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after writing so the data portion should go away reducing the amount of savings possible.
   > 
   > Can you run the test again with these changes:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1513964845

   > It is great if we can have numbers to illustrate the gains after the patch, like the cost reduction for memory or something.
   
   I would be happy to do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537174682

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871",
       "triggerID" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e79140252ba476a4fef89ba85caabc4cd98ce85b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16896",
       "triggerID" : "e79140252ba476a4fef89ba85caabc4cd98ce85b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871) 
   * e79140252ba476a4fef89ba85caabc4cd98ce85b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16896) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567773


##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {
+
+
+  /**
+   * Identifies the record across the table.
+   */
+  protected HoodieKey key;
+

Review Comment:
   > Can we make all these members private and final?
   
   We may not be able to make all these members final because they need to be reassigned 
   
   ```java
     @Override
     public final void read(Kryo kryo, Input input) {
       this.key = kryo.readObjectOrNull(input, HoodieKey.class);
       this.currentLocation = (HoodieRecordLocation) kryo.readClassAndObject(input);
       this.newLocation = (HoodieRecordLocation) kryo.readClassAndObject(input);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518672224

   > It is great if we can have numbers to illustrate the gains after the patch, like the cost reduction for memory or something.
   
   I did a test based on your suggestion:
   The number of HoodieRecords is `1000 * 100`
   ```java
   WriteStatus status = new WriteStatus(false, 1.0);
   for (int i = 0; i < 1000 * 100; i++) {
     status.markSuccess(mock(HoodieRecord.class), Option.empty());
     status.markFailure(mock(HoodieRecord.class), t, Option.empty());
   }
   System.out.println("status memory: " + ObjectSizeCalculator.getObjectSize(status));
   ```
   The memory occupied by `WriteStatus` before optimization is: 125512336 byte
   ```java
   private final List<HoodieRecord> writtenRecords = new ArrayList<>();
   private final List<HoodieRecord> failedRecords = new ArrayList<>();
   ```
   ```
   status memory: 125512336
   ```
   The memory occupied by `WriteStatus` after optimization is: 427408
   ```java
   private final List<IndexItem> writtenRecordIndexes = new ArrayList<>();
   private final List<IndexItem> failedRecordIndexes = new ArrayList<>();
   ```
   ```
   status memory: 427408
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] vinothchandar commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "vinothchandar (via GitHub)" <gi...@apache.org>.
vinothchandar commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1534017088

   @prashantwason @nbalajee @suryaprasanna would this break you all in anyway? Do we need the record data anywhere for successful writes?
   
   cc @rmahindra123 as well. same question. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537267670

   
   
   
   
   ```java
       WriteStatus status = new WriteStatus(true, 1.0);
       String partitionPath = HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[0];
       dataGen = new HoodieTestDataGenerator(new String[] {partitionPath});
       String newCommitTime = "001";
       List<HoodieRecord> records = dataGen.generateInserts(newCommitTime, 1000);
       Throwable t = new Exception("some error in writing");
       for (int i = 0; i < 1000 ; i++) {
         HoodieRecord data1 = records.get(i);
         status.markSuccess(data1, Option.empty());
         data1.deflate();
         HoodieRecord data2 = records.get(i++);
         status.markFailure(data2, t, Option.empty());
         data2.deflate();
       }
       System.out.println("status memory: " + ObjectSizeCalculator.getObjectSize(status));
   ```
   
   > hoodie write handles [deflate the HoodieRecord ](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after writing
   
   根据提供的建议,我做了一个如下测试,
   结果发现优化前后内存空间占用基本没有变化
   最主要的原因就是 hoodie write handles deflate the HoodieRecord after writing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1184550750


##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {
+
+
+  /**
+   * Identifies the record across the table.
+   */
+  protected HoodieKey key;
+

Review Comment:
   Can we make all these members private and final?



##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {

Review Comment:
   Give some doc to the class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567052


##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {
+
+
+  /**
+   * Identifies the record across the table.
+   */
+  protected HoodieKey key;
+

Review Comment:
   > Can we make all these members private and final?
   
   Thank you very much for review and sorry for the late response. I will try to modify the code according to your suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537173338

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871",
       "triggerID" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e79140252ba476a4fef89ba85caabc4cd98ce85b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e79140252ba476a4fef89ba85caabc4cd98ce85b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871) 
   * e79140252ba476a4fef89ba85caabc4cd98ce85b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1510054442

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 402e4a78e4f37f7e587a23855f9042363dd70368 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518670647

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9da1c0da2753e7be3b6612568cc6750ba9944403 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402) 
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 6147cc8922856a286a71fa73140c7a09634a7e9a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518683022

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 6147cc8922856a286a71fa73140c7a09634a7e9a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559) 
   * d463b7d69bd7cced472ec3c82f18edcce33c28a4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518684723

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 6147cc8922856a286a71fa73140c7a09634a7e9a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559) 
   * d463b7d69bd7cced472ec3c82f18edcce33c28a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561) 
   * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536397415

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563) 
   * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536572654

   > @prashantwason @nbalajee @suryaprasanna would this break you all in anyway? Do we need the record data anywhere for successful writes?
   
   record index implementation requires the record key and the location to create the mapping in the index. This is similar requirement to other non-implicit indexes like HBaseIndex.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 closed pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 closed pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord
URL: https://github.com/apache/hudi/pull/8472


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537670179

   Cool, I think we are good to close this issue..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537213624

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16871",
       "triggerID" : "c69a04b7c23d381b6a4fe16c1fb016f8e1363794",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e79140252ba476a4fef89ba85caabc4cd98ce85b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16896",
       "triggerID" : "e79140252ba476a4fef89ba85caabc4cd98ce85b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * e79140252ba476a4fef89ba85caabc4cd98ce85b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16896) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc closed pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc closed pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord
URL: https://github.com/apache/hudi/pull/8472


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1518692886

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16402",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff5d944ec780dbfb0d97eea643ad12420d1cca85",
       "triggerType" : "PUSH"
     }, {
       "hash" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16559",
       "triggerID" : "6147cc8922856a286a71fa73140c7a09634a7e9a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561",
       "triggerID" : "d463b7d69bd7cced472ec3c82f18edcce33c28a4",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563",
       "triggerID" : "0f0011b61776e6f9a9b08481f8ad809e67e44d41",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN
   * d463b7d69bd7cced472ec3c82f18edcce33c28a4 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16561) 
   * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16563) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1511386394

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375",
       "triggerID" : "402e4a78e4f37f7e587a23855f9042363dd70368",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9da1c0da2753e7be3b6612568cc6750ba9944403",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 402e4a78e4f37f7e587a23855f9042363dd70368 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16375) 
   * 9da1c0da2753e7be3b6612568cc6750ba9944403 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1168133213


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordStatus.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class HoodieRecordStatus implements Serializable, KryoSerializable {
+
+

Review Comment:
   key + location are actually an index item, just rename it to `HoodieIndexItem` ?



##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordStatus.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class HoodieRecordStatus implements Serializable, KryoSerializable {
+
+

Review Comment:
   key + location are actually an index item, just rename it to `IndexItem` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1168657932


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordStatus.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class HoodieRecordStatus implements Serializable, KryoSerializable {
+
+

Review Comment:
   > key + location are actually an index item, just rename it to `IndexItem` ?
   
   Thank you very much for your review, I have modified the code, can you re-review the code when you are free, and make some comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] prashantwason commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "prashantwason (via GitHub)" <gi...@apache.org>.
prashantwason commented on PR #8472:
URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536586226

   @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed (boolean) and data (t). Are the savings due to usage of the mock class (which may have bloating compared to the original HoodieRecord)?
   
   But hoodie write handles [deflate the HoodieRecord ](https://github.com/apache/hudi/blob/cabcb2bf2cddedeb3a34047af3935b27cfdfb858/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java#L167)after writing so the data portion should go away reducing the amount of savings possible. 
   
   Can you run the test again with these changes:
     1. WriteStatus status = new WriteStatus(true, 1.0);   // enable success record tracking as errors should be rare
     2. Create an actual HoodieRecord and use that in the for loop instead of the mock(HoodieRecord.class)
     3. Call deflate on the create HoodieRecord to remove the data as the write handles do.
   
   I feel the above may give a more realistic view of savings. 
   
   Also, how did you find this interesting optimization? I am interested as there may be other avenues of such savings within HUDI so if would be good to know how you track these.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

Posted by "clownxc (via GitHub)" <gi...@apache.org>.
clownxc commented on code in PR #8472:
URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567773


##########
hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java:
##########
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import org.apache.hudi.common.util.Option;
+
+import com.esotericsoftware.kryo.Kryo;
+import com.esotericsoftware.kryo.KryoSerializable;
+import com.esotericsoftware.kryo.io.Input;
+import com.esotericsoftware.kryo.io.Output;
+
+import java.io.Serializable;
+
+public class IndexItem implements Serializable, KryoSerializable {
+
+
+  /**
+   * Identifies the record across the table.
+   */
+  protected HoodieKey key;
+

Review Comment:
   > Can we make all these members private and final?
   
   We may not be able to make all these members `final` because they need to be reassigned 
   
   ```java
     @Override
     public final void read(Kryo kryo, Input input) {
       this.key = kryo.readObjectOrNull(input, HoodieKey.class);
       this.currentLocation = (HoodieRecordLocation) kryo.readClassAndObject(input);
       this.newLocation = (HoodieRecordLocation) kryo.readClassAndObject(input);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org