You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/20 23:44:30 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #4811: [HUDI-3213][WIP] Making commit preserve metadata to true for compaction

xushiyan commented on a change in pull request #4811:
URL: https://github.com/apache/hudi/pull/4811#discussion_r810700908



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -243,7 +243,7 @@
 
   public static final ConfigProperty<Boolean> PRESERVE_COMMIT_METADATA = ConfigProperty
       .key("hoodie.compaction.preserve.commit.metadata")
-      .defaultValue(false)
+      .defaultValue(true)
       .sinceVersion("0.11.0")
       .withDocumentation("When rewriting data, preserves existing hoodie_commit_time");

Review comment:
       ```suggestion
         .withDocumentation("When running compaction, preserves existing `_hoodie_commit_time`");
   ```
   
   a side question: this won't affect clustering, will it?

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -294,6 +294,8 @@ protected boolean writeRecord(HoodieRecord<T> hoodieRecord, Option<IndexedRecord
         // Convert GenericRecord to GenericRecord with hoodie commit metadata in schema
         IndexedRecord recordWithMetadataInSchema = rewriteRecord((GenericRecord) indexedRecord.get());
         if (preserveMetadata) {
+          // do not preserve FILENAME_METADATA_FIELD
+          recordWithMetadataInSchema.put(HoodieRecord.HOODIE_META_COLUMNS_NAME_TO_POS.get(HoodieRecord.FILENAME_METADATA_FIELD), newFilePath.getName());

Review comment:
       another matter which may fix separately:
   
   using `HOODIE_META_COLUMNS_NAME_TO_POS` to retrieve column index actually affects performance given it's doing map look up for every record, also it's pretty hard to read. Why not just use a group of int constants? like this
   
   ```
   public static final int HOODIE_COMMIT_TIME_POS = 0;
   ```
   
   i'd static import constants to avoid lengthy code most of the time.

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -294,6 +294,8 @@ protected boolean writeRecord(HoodieRecord<T> hoodieRecord, Option<IndexedRecord
         // Convert GenericRecord to GenericRecord with hoodie commit metadata in schema
         IndexedRecord recordWithMetadataInSchema = rewriteRecord((GenericRecord) indexedRecord.get());
         if (preserveMetadata) {
+          // do not preserve FILENAME_METADATA_FIELD

Review comment:
       is it always the case where we never preserve FILENAME ? since when compact we always create new files, right?  then it's some extra rule we have to remember for this config, which sounds like we preserve all meta cols.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org